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configuration  is  feasible  and  desirable.  By  feasible  we  mean  that  it  is  possible 
to  configure  a  number  of  (slave)  minicomputers  each  of  which  Is  driven  by  identical 
database  management  software  and  controlled  by  a  (master)  minicomputer  for  concur¬ 
rent  operations  on  the  database  spread  over  the  disk  storage  local  to  the  slave 
computers.  This  approach  to  large  databases  may  be  desirable  because  only  off-the- 
shelf  equipment  of  the  same  kind  is  utilized  to  achieve  high  performance  without 
requiring  specially-built  hardware  and  because  identical  database  management  soft¬ 
ware  is  replicated  on  the  slave  computers..  The  approach  makes  the  capacity  growth 
and  performance  improvement  easy  because  duplicate  hardware  can  be  added  and  used 
with  replicable  software. 

To  study  the  feasibility,  we  research  into  the  software  architecture  issues 
and  the  hardware  limitations  of  the  master  and  slaves.  We  also  research  into  the 
replicable  software  for  the  slaves.  Since  these  slaves  are  to  be  operated  concur¬ 
rently  with  corresponding  single-channel  disks,  we  can  investigate  the  effects  of 
either  single-query-and-multiple-database-streams  or  raultiple-queries-and-multiple- 
database-streams  operations  for  performance  improvement.  To  study  the  desirabili¬ 
ty,  we  intend  to  consider  the  factors  related  to  the  problem  of  capacity  growth  and 
cost  effectiveness.  The  central  issue  may  be  whether  we  can  realize  a  high-perfor¬ 
mance  and  great-Capacity  database  management  system  with  a  multiplicity  of  mini¬ 
computers,  large  number  of  single-channel  disks  and  replicable  software  cost- 
effectively. 

Irr  this  report,  w«?"presentj  a  new  approach  to  the  solution  of  database  manage-r 
ment  problems  involving  database  growth  and  performance  enhancement.  A  system 
which  uses  a  multiplicity  of  conventional  minicomputers,  novel  hardware  configura¬ 
tion  and  innovative  software  design  is  presented.  This  extensible  system  tries  to 
achieve  the  ideal  goal  of  having  the  performance  (both  response  time  and  through¬ 
put)  be  proportional  to  the  multiplicity  of  minicomputers. 

”r^c^0trr  first  effort  is  to  identify  the  problems  and  bottlenecks  involved  in  de¬ 
veloping  such  an  ideal  system.  Two  major  problems,  one  called  the  controller  li¬ 
mitation  problem  and  the  other  the  channel  limitation  problem  are  identified. 

Having  identified  these  problems,  j&r  next  effort  is  to  systematically  eliminate  or 
minimize  the  ill-effects  of  these  problems.  ^We~have  also  identified,^  number  of 
other  problems.  • 

For  studying  the  multiple  back-end  database  system,  we  utilize  queueing  models 
and  simulation.  Queueing  models  and  simulation  are  used  at  different  design  stages 
in  order  to  aid  the  design  process.  Finally,  ours  is  the  only  comprehensive  design 
of  a  multiple  backend  system  that  covers  all  aspects  of  database  management.  Algo¬ 
rithms  for  the  four  basic  request,  types  (insert,  retrieve,  delete  and  update)  algo¬ 
rithms  for  aggregate  operations,  algorithms  for  access  control,  algorithms  for  con¬ 
currency  control,  algorithms  for  database  reorganization  and  algorithms  for  addi¬ 
tion  of  new  backends  are  all  analyzed  and  specif ied  - 

Because  of  its  volume,  the  design  and  analysis  is  presented  in  two  parts.  In 
the  Part  I,  we  include  four  chapters.  In  Chapter  I,  we  set  our  aim  and  scope  of 
the  research.  In  Chapter  II,  we  will  make  a  survey  of  typical  software-oriented 
multiple  backend  systems  in  existence.  We  will  point  out  the  strengths  and  weak¬ 
ness  of  these  systems  and  make  some  recommendations  so  that  the  weaknesses  point¬ 
ed  out  for  some  of  the  systems  can  be  avoided.  One  of  the  recommendations  is  on 
data  placement  strategies.  As  a  multlple-data-stream  computer,  the  multi-backend 
database  system  must  be  able  to  move  data  across  the  backends  parallelly.  Thus, 
the  placement  of  the  database  over  separate  disk  systems  for  the  corresponding 
backends  constitutes  an  important  issue  for  study.  We  have  proposed,  and  Analyzed  a 
number  of  strategies  for  data  placement.  We  also  select  one  for  our  system.  In 
Chapter  III,  we  will  select  a  data  model  and  data  manipulation  language  for  imple¬ 
mentation.  We  argue  that  the  attribute-based  model  is  the  "superior"  or  "most  ap¬ 
propriate"  data  model.  Our  main  argument  is  that  prevailing  data  models  and  lan¬ 
guages  such  as  relational,  network  and  hierarchical  can  be  converted  and  translated 
into  the  attribute-based  model  in  a  straightforward  manner,  thus,  allowing  the  sys¬ 
tem  to  support  a  number  of  models  and  languages.  Accordingly,  a  simple  data  mani- 
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pulation  language  based  on  this  attribute-based  model  is  chosen  and  pre¬ 
sented.  In  Chapter  IV,  we  elaborate  on  the  process  of  request  execution. 
These  are  the  requests  for  record  retrieval,  insertion,  deletion  and  up¬ 
date.  Central  to  the  process  of  request  execution  is  the  study  of  infor¬ 
mation  about  the  database.  We  propose  a  new  method  for  record  clustering 
and  a  new  strategy  for  directory  management.  In  particular,  the  directory 
response  time,  throughput  and  storage  requirements  are  analyzed.  A  closed 
queueing  network  model  which  incorporates  a  separate  I/O  submodel  for  the 
disk  subsystem  is  used  for  this  study. 

The  rest  of  the  design  and  analysis  will  be  included  in  Part  II  of 
thi?  report.  The  entire  design  and  analysis  are  based  on  analytical  and 
simulation  studies.  An  implementation  effort  will  be  on  its  way.  It  is 
hoped  that  we  can  test  the  system  experimentally  soon. 
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1.  INTRODUCTION 

For  solving  database  management  problems,  many  researchers  [Babb79,  Bane78, 
Cope73,  Schu79,  Wah80]  have  proposed  solutions  util  zing  special-purpose  hard¬ 
ware,  known  as  database  machines.  However,  it  has  ->t  yet  been  demonstrated 
that  these  database  machines  are  cost-effective.  A  different  approach  to  the 
solution  of  database  management  problems  may  involve  the  use  of  conventional 
hardware  elements,  perhaps  in  large  number,  with  most  ot  the  database  manage¬ 
ment  functions  carried  out  in  software.  In  other  words,  we  may  be  searching 
for  hardware  solutions  without  having  considered  all  possible  software  solu¬ 
tions  to  database  management  problems.  In  this  dissertation,  we  emphasize  the 
software  approach  to  the  solution  of  database  management  problems. 

1.1  The  Coal 

The  goal  is  to  investigate  whether  it  is  possible  to  use  a  multiplicity  of 
general-purpose  processing  and  storage  elements  (specifically,  minicomputers 
and  disk  drives),  novel  hardware  configuration  and  innovative  software  design 
for  achieving  throughput  gain  and  response-time  improvement  over  conventional 
database  management  systems.  Ideally,  the  performance  gains  and  improvements 
should  be  proportional  to  the  multiplicity  of  processing  and  storage  elements 
used. 

If  this  ideal  goal  cannot  be  achieved,  then  the  case  for  database  machines 
that  use  special-purpose  hardware  becomes  very  strong.  On  the  other  hand,  if 
it  can  be  shown  that  such  an  ideal  system  can  be  obtained,  then  a  cost-effective 
way  for  high-volume  and  great-capacity  database  management  may  become  more 
readily  available.  Database  machines  may  still  provide  better  performance,  but 
they  will  be  more  expensive.  Furthermore,  database  machines  may  represent  a 
distant  solution  whose  time  is  not  yet  near. 

In  this  dissertation,  we  will  propose  a  hardware  configuration  of  multiple 
minicomputer  systems  and  a  design  of  a  software  database  management  system  for 
the  configuration.  We  will  use  analytical  techniques  and  simulation  studies  to 
determine  how  closely  the  ideal  goal  has  been  achieved  by  our  proposed  confi¬ 
guration  and  design. 

1.2  A  Taxonomy  of  Existing  Systems 

One  way  to  give  some  perspective  to  our  work  is  by  way  of  a  taxonomy  of 
database  systems  and  machines.  By  developing  the  taxonomy  and  indicating  the 
relative  position  of  our  work  within  this  taxonomy,  we  may  also  explain  the 
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similarities  and  differences  of  our  work  with  that  of  others  which  are  either 
operational  or  being  proposed.  Finally,  we  will  show  the  advantages  of  the 
software  approach  over  the  machine  approach.  The  taxonomy  we  developed  is 
shown  in  Figure  1. 

At  the  highest  level,  we  differentiate  between  systems  which  utilize  a 
central  controller  to  simplify  the  control  functions  and  those  which  do  not 
utilize  a  central  controller.  In  general,  it  is  the  case  that  systems  which 
do  not  utilize  a  central  controller  are  those  where  the  database  is  geographi¬ 
cally  dispersed  and  the  various  computers  have  to  be  connected  together  by  a 
network.  Examples  of  such  systems  are  SDD-1  [Roth80] ,  Distributed  Ingres 
[Ston76a],  and  Muffin  [Ston79].  Also,  generally  speaking, 
systems  that  do  not  use  a  central  controller  will  have  either  partially  or 
fully  duplicated  databases.  The  need  for  having  duplicate  databases  becomes 
important  in  a  geographically  dispersed  database  where  data  transfers  among 
computers  via  a  network  are  expensive.  Hence,  it  is  important  to  duplicate 
data  so  that  the  data  is  very  close  to  the  point  where  it  is  actually  needed. 

In  systems  with  duplicate  databases,  the  concurrency  control  algorithms  are 
mostly  complex  and  require  large  numbers  of  messages  to  be  exchanged  [Hsla81]. 
This  dissertation  is  not  concerned  with  systems  that  do  not  utilize  a  central 
controller  and  where  the  databases  are  duplicated,  though  this  could  very  well 
be  an  area  for  future  research. 

In  developing  our  taxonomy  further,  we  divide  the  systems  with  central  con¬ 
trollers  into  two  classes,  namely,  the  hardware-oriented  systems  and  the  soft¬ 
ware-oriented  systems.  Hardware-oriented  systems  (also  called  database  machines) 
are  those  which  typically  use  special-purpose  hardware  to  perform  a  large  number 
of  the  database  management  functions.  These  systems  [Bane78,  Cope73,  Schu79] 
process  the  data  ’on  the  fly'  while  the  data  is  being  read  from  the  disk  tracks. 
Special-purpose  processors  which  are  associated  with  blocks  of  secondary  sto¬ 
rage  are  utilized  to  perform  on-the-fly  processing.  Our  taxonomy  is,  of  course, 
subjective  in  defining  a  hardware-oriented  system  in  terms  of  the  number  of  the 
database  management  functions  being  accomplished  in  hardware.  What  is  a  ’large’ 
number  of  the  database  management  functions?  Clearly,  there  will  be  some  sys¬ 
tems  which  fall  in  the  borderline  between  hardware-oriented  and  software-oriented 
systems.  Note  that  we  do  not  consider  a  database  management  system  such  as 
DIRECT  [Dewi78]  to  be  a  hardware-oriented  system  even  though  it  is  often  referred 
to  as  a  database  machine  in  the  literature.  This  is  because  the  only  special- 
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purpose  hardware  in  DIRECT  is  a  crossbar  switch.  All  the  database  management 
functions  in  DIRECT  are  performed  by  software  in  conventional  processors. 

Hence,  by  our  classification,  DIRECT  is  a  software-oriented  database  management 
system. 

Software-oriented  systems  are  those  which  do  not  use  a  significant  amount 
of  special-purpose  hardware  and  where  most  of  the  functions  of  database  manage¬ 
ment  are  done  in  software.  Such  systems  may  be  further  subdivided  into  three 
categories  as  conventional  database  management  systems,  single  back-end  database 
management  systems  and  multiple  back-end  database  management  systems.  In  a 
conventional  database  management  system,  all  of  the  major  software  components, 
such  as  the  operating  system,  the  database  management  system  and  user  (applica¬ 
tion)  programs,  are  executed  in  a  single  computer  system  which  has  direct  access 
to  the  database  stored  in  the  secondary  memory.  Performance  upgrades  in  a  con¬ 
ventional  database  management  system  are  usually  costly  and  disruptive.  For 
instance,  it  may  be  necessary  to  add  large  memory  modules  or  to  incorporate  more 
channels  or  to  replace  the  central  processor  with  a  more  powerful  model.  Such 
upgrades  often  require  major  effort  so  that  it  is  well  accepted  that  conventional 
database  management  systems  are  not  easily  extensible.  In  this  context,  we  de¬ 
fine  extensibility  of  a  database  management  system  as  the  capability  of  the  sys¬ 
tem  for  upgrade  with 

(1)  no  modification  to  existing  software, 

(2)  no  additional  programming, 

(3)  no  modification  to  existing  hardware  and 

(4)  no  major  disruption  of  system  activity  when  additional  hardware  is 
being  incorporated  into  the  existing  hardware. 

A  new  configuration  for  database  management  called  the  back-end  configura¬ 
tion  was  proposed  in  [Cana74].  Excellent  descriptions  of  the  back-end  concept 
and  its  advantages  and  disadvantages  are  given  in  [Lowe76,  MarySO]  and  will  not 
he  reneated  here.  Briefly,  this  configuration  utilizes  two  computer  systems  — 
a  host  and  a  back-end.  The  database  management  functions  are  implemented  on 
the  back-end  which  has  exclusive  access  to  the  database  on  secondary  storage 
devices.  We  shall  refer  to  such  a  system  as  a  single  back-end  database  manage¬ 
ment  system,  since  multiple  back-end  systems  have  also  been  proposed  [Lowe76, 
Ston78,  Dewl78].  Among  the  advantages  claimed  of  single  back-end  database 
management  systems  is  that  performance  upgrades  are  less  disruptive,  more  manage¬ 
able  and  on  a  smaller  scale  than  in  a  conventional  database  management  system. 


Firsc,  upgrading  the  back-end  requires  no  modification  to  application  programs 
since  they  are  executed  In  the  host.  Furthermore,  the  back-end  separates  the 
characteristics  of  the  secondary  storage  devices  from  the  characteristics  of 
the  host.  Thus,  new  storage  technology  may  be  employed  without  changing  hard¬ 
ware  or  software  in  the  host.  However,  single  back-ends  have  the  disadvantage 
that,  ultimately,  performance  upgrades  will  require  replacement  of  the  back-end 
and  this  may  entail  software  modifications  and  hardware  disruption. 

The  next  logical  step  in  the  evolution  of  database  management  systems  is 
the  multiple  back-end  system  approach  as  suggested  by  a  number  of  researchers 
[Bane78,  Dewi78,  Lowe76,  Ston78].  This  approach  employs  multiple  computer  sys¬ 
tems  for  database  management  with  a  software-oriented  system  and  a  centralized 
controller.  A  few  words  regarding  the  differences  between  our  system  and  other 
software-oriented  multiple  back-end  systems  with  controllers  [Lowe76,  Ston78, 
Dewi78]  is  now  in  order.  The  work  of  [Lowe76]  did  not  represent  the  design  of 
any  specific  database  management  system.  That  work  was  essentially  in  the  na¬ 
ture  of  an  idea  for  an  architecture  which  others  could  utilize  to  design  systems. 
The  works  of  [Dewi78,  Ston78] ,  however,  are  actual  designs.  The  difference  be¬ 
tween  our  work  and  that  of  these  two  researchers  are  explained  at  a  very  detailed 
level  in  Chapter2.  Here,  we  satisfy  ourselves  with  some  differences  in  terms  of 
overall  objectives.  None  of  the  above-mentioned  researchers  emphasize  the  ex¬ 
tensibility  of  the  multiple  back-end  approach  nor  do  they  seem  very  concerned 
about  designing  their  respective  multiple  back-end  systems  to  make  them  exten¬ 
sible.  Tn  [Lowe76]f  for  example,  the  multiple  back-end  system  is  considered  from 
the  viewpoint  of  enhanced  reliability  over  single  back-end  systems.  In  [Dewi78, 
Ston78],  however,  the  emphasis  is  on  the  general  notion  that  multiple  back-ends 
may  provide  greater  throughput  than  single  back-end  systems.  However,  we  be¬ 
lieve  in  more  specific  findings.  In  other  words,  we  believe  that  there  are  de¬ 
signs  which  can  achieve  the  ideal  goal  of  the  throughput  improvement  being  pro¬ 
portional  to  the  multiplicity  of  back-ends.  None  of  the  above-mentioned  sys¬ 
tems  has  revealed  such  a  design.  We,  on  the  other  hand,  reveal  a  design  of  a 
multiple  back-end  system  which  could  lead  to  a  response  time  being  inversely 
proportional  to  the  multiplicity  of  back-ends .  It  is  also  our  belief  that  our 
multiple  back-end  system  is  extensible.  That  is,  new  storage  devices  and  back¬ 
ends  may  be  added  to  the  configuration  to  Improve  its  storage  and  performance 


capabilities  without  the  need  of  re-designing  and  re-programming  of  the  soft¬ 
ware  and  without  major  disruption  of  the  existing  hardware.  We  will  be  inter- 
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ested  In  multiple  back-ends  from  the  viewpoint  of  extensibility  to  improve 
response  time  and  throughput.  Furthermore,  we  will  not  utilize  any  special- 
purpose  hardware  in  our  system. 

As  a  result  of  the  extensible  nature  of  our  system,  we  make  the  following 
claim  about  our  system.  Consider  a  user  of  our  database  management  system  who 
finds  that  the  system  is  saturated  due  to  database  growth  and  transaction  in¬ 
crease.  The  user  would  like  to  upgrade  the  present  system.  The  designers  of 
database  machines  and  conventional  database  management  systems  would  offer  the 
user  the  alternative  of  using  their  database  machines  and  their  systems,  re¬ 
spectively,  thereby  replacing  the  existing  software  and  hardware.  Instead,  we 
offer  the  user  a  different  alternative.  We  do  not  ask  the  user  to  replace  the 
existing  software,  since  existing  software  is  extensible  and  requires  only  a 
system  generation.  We  also  do  not  ask  the  user  to  replace  the  existing  hard¬ 
ware.  We  merely  ask  the  user  to  add  identical  hardware;  consequently,  when 
and  if  the  need  to  upgrade  the  system  arises,  the  user  simply  adds  more  hard¬ 
ware  and  uses  the  same  existing  software  for  performance  gain  and  capacity 
growth . 

The  presentation  of  the  taxonomy  and  the  place  of  our  system  in  that  tax¬ 
onomy  has  made  clear  some  of  the  advantages  and  unique  characteristics  of  the 
software  approach  to  database  management  system  design  which  we  intend  to  fol¬ 
low.  Two  other  considerations  confirm  that  this  may  be  a  viable  alternative 
to  follow.  We  may  term  these  as  hardware  considerations  and  software  consi¬ 
derations.  Beth  of  these  are  considered  in  terms  of  our  software  approach  and 
the  database  machines  approach.  First  of  all,  from  a  hardware  point  of  view, 
the  conventional  processors  and  disk  drives  will  be  cheaper  than  special-purpose 
processors  and  disk  drives  which  may  be  needed  in  a  database  machine.  From  a 
software  point  of  view,  software  for  a  large  number  of  processors  may  be  needed 
in  this  approach.  However,  novel  software  design  may  be  used  to  ensure  that 
the  software  in  each  of  the  minicomputers  is  identical.  Thus,  the  software  com¬ 
plexity  is  not  proportional  to  the  multiplicity  of  minicomputers.  Furthermore, 
we  are  trying  to  support  large  databases  that  evolve  and  grow  over  time.  This 
growth  process  is  generally  gradual.  However,  performance  upgrades  are  necessary 
from  time  to  time  for  such  database  growth.  We  have  already  indicated  that  our 
particular  approach  to  database  management  systems  can  lead  to  extensible  systems 
and  improved  performance  to  accomodate  the  growth  and  the  evolution.  Thus,  from 
a  software,  hardware  and  database  growth  point  of  view,  the  software  approach 


using  multiple  back-ends  seems  preferable  to  the  database  machines  approach. 
Thus,  such  an  approach  is  certainly  worth  investigating  and  that  is  the  focus 
of  this  dissertation. 

1.3  Design  Issues  To  Be  Studied 

Before  an  extensible  system  can  be  developed,  a  number  of  issues  related 
to  multiple  back-end  systems  must  be  resolved.  None  of  the  researchers  men¬ 
tioned  above  have  provided  solutions  to  all  these  issues,  though  some  research¬ 
ers  have  proposed  solutions  to  a  subset  of  them.  Let  us  enumerate  below  the 
various  issues  to  be  considered  in  the  design  of  a  multiple  back-end  system. 

The  issues  may  be  divided  into  three  broad  categories  as  hardware  issues,  sys¬ 
tem  issues  and  software  issues. 

1.3.1  Hardware  Issues 

A.  The  Back-end  Interconnection 

The  questions  to  be  resolved  here  are,  "What  is  the  optimal  way  of  inter¬ 
connecting  the  back-ends  together?",  and  "What  is  the  optimal  way  of  connecting 
the  host  to  the  back-ends?"  In  answering  these  questions,  we  must  recall  that 
the  throughput  of  multiple  processor  systems  increases  significantly  only  for 
the  first  few  additional  processors.  At  some  point,  the  throughput  actually 
begins  to  decrease  with  each  additional  processor.  Examples  of  this  phenomenon 
are  documented  by  [Chu80]  and  [Jenn77].  This  decrease  in  throughput  is  due  to 
excessive  interprocessor  communication  during  the  execution  of  a  single  task 
which  causes  a  saturation  effect  [Chu78],  We  must  try  to  avoid  this  excessive 
interprocessor  communication  in  designing  an  optimal  multiple  back-end  system. 

B.  The  Database  Store  Interconnection 

Another  issue  is  the  question  of  which  secondary  devices  should  be  con¬ 
nected  to  which  back-ends.  For  instance,  if  two  back-ends  and  two  disk  drives 
are  given,  would  the  configuration  shown  in  Figure  2a  where  each  back-end  is 
connected  to  exactly  one  disk  drive,  be  superior  to  the  configuration  of  Figure 
2b  where  each  back-end  is  connected  to  both  disk  drives?  The  latter  configura¬ 
tion  is  more  flexible  since  any  back-end  may  be  employed  to  execute  any  user 
request.  However,  it  is  more  expensive  in  terms  of  hardware  complexity.  Fur¬ 
thermore,  if  two  back-ends  can  access  the  same  data,  there  exists  the  problem 
of  deciding  which  back-end  should  be  allowed  to  access  a  particular  data  item. 


Figure  2a.  A  Configuration  Where  Each  Back-end  is 
Connected  to  Only  Certain  Disk  Drives 


Figure  2b.  A  Configuration  Where  Each  Back-end  is 
Connected  to  all  Disk  Drives 


Such  a  configuration  also  complicates  the  concurrency  control  issue,  since  one 
back-end  may  now  read  and  update  the  same  data  being  read  and  possibly  updated 
by  another  back-end. 


1.3.2  System  Issues 

A.  The  Data  Base  Placement 

Here,  the  issue  to  be  resolved  is  regarding  the  best  way  of  placing  the 
files  constituting  the  database  across  the  various  back-ends  in  order  to 
achieve  the  maximum  amount  of  parallel  access  for  the  system  during  read  and 
write  operations.  In  other  words,  how  should  the  database  be  partitioned? 

As  much  as  possible,  we  should  try  to  ensure  that  the  records  constituting  the 
response  set  to  a  user  request  are  not  stored  at  a  single  back-end.  Rather, 
these  records  in  the  response  set  should  be  distributed  evenly  across  the 
various  back-ends.  Such  a  placement  policy  should  lead  to  maximal  parallel  ac¬ 
cess.  Techniques  for  achieving  such  a  placement  policy  must  be  found. 

B.  The  Execution  Mode 

We  ask  whether  the  multiple  back-ends  should  execute  in  a  single- instruction- 
stream-multiple-data-stream  (SIMD)  fashion  or  in  a  multiple-instruction-stream- 
multiple-data-stream  fashion  (MIMD) .  That  is,  should  each  of  the  back-ends  be 
executing  the  same  request  but  on  different  data  (SIMD),  or  should  the  back¬ 
ends  be  executing  in  an  asynchronous  fashion  (MIMD).  For  example,  the  results 
of  our  study  in  [Meno80]  have  shown  us  that  MIMD  configurations  need  not  always 
be  superior. 

C.  Directory  Structure 

Many  database  management  systems  also  have  a  secondary  body  of  information 
(often  referred  to  as  the  directory)  which  is  used  to  decrease  the  search  space 
(and  therefore  time)  of  the  data  base.  Issues  to  be  resolved  here  are  whether 
a  directory  is  indeed  necessary  and  if  it  is  necessary,  then  what  form  should 
it  take?  Should  we  use  the  inverted  list  organization,  the  multilist  organiza¬ 
tion,  or  any  of  the  whold  spectrum  of  alternatives  as  elucidated  in  [Hsia70]? 

D.  The  Directory  Placement 

Having  decided  on  the  nature  of  the  directory,  we  next  need  to  answer  the 
following  question.  Where  Is  the  best  place  to  store  the  directory?  Should  It 


be  stored  at  the  controller,  in  one  of  the  back-ends,  or  in  all  of  the  back-ends? 
Should  it  be  duplicated  or  partitioned?  The  tradeoffs  to  be  considered  are 
those  of  storage  requirements  versus  reliability,  system  throughput  and  re¬ 
sponse  time. 

E.  The  Security  Issue 

An  important  issue  that  must  always  be  considered  in  the  design  of  a  data¬ 
base  system  is  how  security  is  to  be  enforced.  That  is,  the  question  of  de¬ 
ciding,  in  a  multi-user  environment,  who  should  have  what  access  to  which  data. 

F.  The  Data  Model 

Any  database  management  system  must  decide  what  data  model,  e.g.,  rela¬ 
tional,  network  and  hierarchical,  it  will  support.  This  decision  must  be  made 
irrespective  of  whether  we  have  a  conventional  system,  a  single  back-end  sys¬ 
tem  or  a  multiple  back-end  system.  However,  it  is  possible  that  some  issues 
peculiar  to  multiple  back-end  systems  may  impact  our  choice  of  data  model.  For 
example,  it  will  be  shown  that  it  is  easier  to  partition  the  database  if  it  is 
represented  in  one  kind  of  data  model  and  not  in  the  other  kind.  The  parti¬ 
tioning  criteria  is  relevant  only  in  the  context  of  multiple  back-end  database 
management  systems. 

G.  The  Data  Manipulation  Language 

Finally,  we  must  decide  upon  a  language  which  can  be  used  by  the  users  to 
manipulate  the  database  in  an  easy  fashion.  The  choice  of  a  data  manipulation 
language  will  be  closely  related  to  the  chosen  data  model. 

1.3.3  Software  Issues 
A.  The  Degree  of  Concurrency 

We  are  designing  an  architecture  that  supports  multiple  users  concurrently. 
At  each  back-end,  we  have  the  choice  of  processing  the  requests  from  these  multi¬ 
ple  users  in  an  interleaved  manner  or  one  at  a  time.  If  the  requests  are  not 
interleaved,  then  the  software  in  the  back-end  is  simplified.  However,  the 
price  to  be  paid  in  terms  of  Increased  user  response  time  and  low  back-end  uti¬ 
lization  may  be  too  high.  For  this  reason,  we  may  want  to  support  concurrent 
request  processing  at  each  back-end. 
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B.  Consistency  Control  and  Deadlock  Avoidance 

If  each  back-end  is  to  handle  multiple  requests  and  multiple  transactions, 
then  algorithms  must  be  developed  to  ensure  consistency  of  the  database  in  the 
presence  of  multiple  transactions.  These  algorithms  must  ensure  that  each 
transaction  behaves  as  if  it  were  the  only  one  in  the  system.  Consistency 
control  algorithms  may  or  may  not  permit  deadlocks  to  occur.  If  deadlocks  are 
permitted  to  occur,  other  algorithms  must  be  developed  for  detecting  and  re¬ 
covering  from  deadlocks. 

1.4  Terminology 

We  wish  to  end  this  section  with  a  brief  look  at  terminology.  Throughout 
the  remainder  of  this  report,  we  will  refer  to  the  multiple  back-end  system 
which  we  are  attempting  to  design  as  the  multi-mini  database  system  (MDBS).  In 
MDBS,  we  will  refer  to  one  of  the  minicomputer  systems  as  the  controller  which 
controls  the  actions  of  the  rest  of  the  minicomputer  systems  known  as  back-ends . 

The  throughput  of  MDBS  is  defined  as  the  average  number  of  user  (program)  re¬ 
quests  executed  by  the  system  in  a  second.  Throughputs  may  also  be  defined  for 
various  request  classes  by  a  straightforward  extension  of  the  definition  of 
throughput.  The  response  time  of  a  request  in  MDBS  is  the  time  between  the  ini¬ 
tial  Issuance  of  the  request  by  a  user  (or  a  user  program)  and  the  final  receipt 
of  the  entire  response  set  of  this  request  by  the  user  (program). 

1 . 5  Contributions  of  Research 

In  this  dissertation,  we  present  a  new  approach  to  the  solution  of  database 
management  problems  involving  database  growth  and  performance  enhancement.  A 
system  which  uses  a  multiplicity  of  conventional  minicomputers,  novel  hardware 
configuration  and  innovative  software  design  is  presented.  This  extensible  sys¬ 
tem  tries  to  achieve  the  ideal  goal  of  having  the  performance  (both  response  time 
and  throughput)  be  proportional  to  the  multiplicity  of  minicomputers. 

Our  first  effort  is  to  identify  the  problems  and  bottlenecks  involved  in  de¬ 
veloping  such  an  ideal  system.  Two  major  problems,  one  called  the  controller  limi¬ 
tation  problem  and  the  other  the  channel  limitation  problem  are  identified.  Having 
Identified  these  problems,  our  next  effort  is  to  systematically  eliminate  or  mini¬ 
mize  the  ill-effects  of  these  problems.  We  have  also  identified  a  number  of  other 
problems . 

For  studying  the  multiple  back-end  database  system,  we  utilize  queueing  models 
and  simulation.  Queueing  models  and  simulation  are  used  at  different  design  stages 
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order  to  aid  the  design  process.  Finally,  ours  is  the  only  comprehensive  de¬ 
sign  of  a  multiple  back-end  system  that  covers  all  aspects  of  database  manage¬ 
ment.  Algorithms  for  the  four  basic  request  types  (insert,  retrieve,  delete 
and  update),  algorithms  for  aggregate  operations,  algorithms  for  performing  re¬ 
lational  join,  algorithms  for  enforcing  security,  algorithms  for  enforcing  con¬ 
currency  control,  algorithms  for  database  reorganization  and  algorithms  to  be 
executed  at  the  addition  of  a  new  back-end  are  all  analyzed  and  completely 
specified. 

At  a  more  detailed  level,  the  following  contributions  may  be  cited.  A 
solution  to  the  task  partitioning  problem  for  multiple  back-end  database  sys¬ 
tems  has  been  presented.  Unlike  previously  proposed  solutions  such  as  the  one 
in  [Dewi78],  our  solution  minimizes  message  traffic  among  multiple  back-ends 
thus  improving  response  time  and  throughput.  Experiments  have  indicated  that 
the  proposed  solution  is  extremely  effective  for  a  broad  range  of  back-ends. 
Also,  for  the  first  time,  the  importance  of  a  broadcast  capability  in  multiple 
back-end  database  management  systems  has  been  clearly  demonstrated.  More  im¬ 
portantly,  the  very  negative  impact  of  not  having  a  broadcast  capability  has 
also  been  demonstrated.  Experiments  have  shown  that  for  systems  which  do  not 
have  a  broadcast  capability,  the  response  time  will  improve  only  with  the  first 
few  back-ends.  After  that,  the  response  time  actually  begins  to  increase  with 
each  additional  back-end. 

A  new  and  simple  algorithm  for  performing  joins  on  a  bus-type  architecture 
is  presented  and  thoroughly  analyzed.  Unlike  other  studies,  this  analysis  not 
only  includes  I/O  time,  but  it  also  incorporates  the  overlap  of  I/O  and  join 
processing.  Furthermore,  the  effects  of  main  memory  size  on  join  time  are  also 
included  in  the  analysis.  Similar  analyses  of  other  existing  algorithms  for 
join  processing  are  also  made  and  lead  to  some  surprising  results.  Some  other 
algorithms  are  actually  shown  to  perform  worse  with  increasing  number  of  back¬ 
ends.' 

The  algorithms  presented  for  aggregate  operations  are  not  very  different 
from  those  presented  for  other  systems  [Bora80,  Schu79].  What  is  different  is 
the  analysis  of  these  algorithms.  Once  again,  a  model  which  incorporates  I/O 
and  CPU  processing  overlap  is  used.  Furthermore,  for  the  first  time,  an  optimum 
value  for  the  number  of  back-ends  that  may  participate  in  an  aggregate  operation 
is  derived. 


Eight  schemes  are  presented  for  directory  management  in  the  new  system. 


-  13  - 


The  directory  management  schemes  are  different  from  those  for  any  other  system. 
These  schemes  are  compared  in  terms  of  system  response  time,  throughput  and  sto¬ 
rage  requirements.  A  queueing  model  is  used  for  this  purpose.  To  the  author's 
knowledge,  this  is  the  first  time  that  a  system  has  chosen  a  directory  manage¬ 
ment  policy  based  on  analysis  using  queueing  theory. 

The  queueing  network  model  used  is  a  closed  queueing  network  model.  This 
model  incorporates  a  separate  I/O  submodel  for  the  disk  subsystem.  The  I/O  sub¬ 
model  that  we  developed  was  simpler  than  other  existing  models  [Bard81,  Gotl73, 
Fran74]  of  disk  subsystems  in  that  we  were  able  to  obtain  a  closed  form  solution 
for  the  response  time.  Our  proposed  I/O  submodel  also  differs  from  all  other 
models  for  disk  subsystems  in  that  analysis  is  made  for  bulk  arrivals.  Our 
analysis  assumed  bulk  arrivals  because  of  the  fact  that  MDBS  allows  the  user  to 
issue  requests  in  a  query-based  language.  As  a  result,  a  user  query  would  re¬ 
quire  a  number  of  records  (and,  hence,  a  number  of  disk  tracks)  to  be  accessed 
at  the  same  time.  Finally,  a  unique  iteration  technique  is  employed  in  order 
to  incorporate  this  I/O  submodel  into  the  overall  closed  queueing  network  model. 

The  data  placement  policy  is  based  on  the  work  of  [Wong71,  Roth74].  However, 
it  differs  from  all  these  methods  in  many  important  respects  which  will  be  dis¬ 
cussed  in  Chapter  4.  Very  briefly,  it  differs  from  their  work  because  we  are 
trying  to  minimize  response  time  in  a  multiple  back-end  system  and  they  were  in¬ 
terested  only  in  single  computer  systems.  Experiments  have  been  conducted  which 
demonstrate  the  effectiveness  of  our  placement  policy.  Certain  theoretical  re¬ 
sults  of  our  optimal  placement  policy  are  also  presented. 

The  concurrency  control  scheme  is  executed  at  each  back-end  rather  than  at 
the  controller.  This  is  only  possible  because  of  the  peculiarity  of  our  archi¬ 
tecture.  In  a  system  like  DIRECT  [Dewi78],  concurrency  control  algorithms  have 
to  be  run  in  the  controller.  In  such  a  system,  the  performance  of  concurrency 
control  algorithms  cannot  be  improved  by  employing  more  back-ends.  The  design 
of  our  system  has  allowed  us  the  flexibility  to  improve  the  performance  of  con¬ 
currency  control  by  utilizing  more  back-ends.  We  define,  for  the  first  time,  a 
term  called  monolithic  consistency  to  describe  the  kind  of  consistency  required 
in  a  partitioned  database  (just  like  inter-  and  intra-consistency  are  needed  in 
a  centralized  database  and  inter-,  intra-  and  mutual  consistency  are  needed  in 
a  distributed  database).  A  solution  which  preserves  monolithic  consistency  is 
presented.  Our  solution  is  unique  in  a  number  of  ways.  First,  it  advocates 
the  use  of  four  lock  modes,  instead  of  the  traditional  two  lock  modes.  By  se- 
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parating  the  insert  and  delete  locks  from  the  update  locks,  we  achieve  a 
greater  degree  of  concurrency.  Another  contribution  is  the  identification  of 
permutable  and  compatible  requests  for  a  high-level  predicate-based  query 
language  (and  not  for  a  low-level  one  as  in  [Gard77]).  At  a  practical  level, 
a  method  for  enforcing  locking  when  using  a  predicate-based  query  language  is 
proposed.  Unlike  [Eswa76]  which  uses  predicate  locking,  our  scheme  is  much 
simpler.  Unlike  [Jord81] ,  the  scheme  allows  predicate-based  updates.  Unlike 
both  [Eswa76]  and  [Jord81],  the  scheme  is  deadlock  free.  Hence,  it  cannot 
suffer  from  the  'starvation  problem'  where  transactions  are  rolled  back  infi¬ 
nitely  and  are  not  guaranteed  to  complete. 

The  work  on  access  control  and  security  is  based  on  the  work  in  [MCca75]. 
However,  it  differs  from  [MCca75]  in  that  the  method  of  specifying  access  pri¬ 
vileges  is  simplified  and  the  security  atoms  (or  clusters)  are  formed  in  a  dif¬ 
ferent  way.  We  are  also  able  to  enforce  protection  down  to  the  attribute  level. 
A  further  extension  allows  for  the  protection  of  access  against  statistical  re¬ 
quests.  The  security-related  tables  are  stored  in  the  multiple  back-ends.  Each 
back-end  only  needs  to  store  a  subset  of  the  tables.  As  a  result,  the  response 
time  and  throughput  of  the  entire  system  is  improved.  Also,  for  the  first  time, 
we  present  a  comparative  study  of  this  security  enforcement  scheme  with  others 
in  the  literature.  Finally,  we  are  able  to  make  the  schemes  verifiable  [Down77] 
That  is,  by  guaranteeing  the  correctness  of  a  small  portion  of  the  database  man¬ 
agement  system,  we  are  able  to  guarantee  the  correctness  of  the  access  control 
mechanism.  The  scheme  borrows  from  the  ideas  of  [Down77].  However,  [Down77] 
is  a  verifiable  security  mechanism  which  does  not  support  value-dependent  pro¬ 
tection.  In  [Down77],  it  is  speculated  that  a  verifiable  mechanism  for  value- 
dependent  access  can  not  be  found.  Our  scheme  contradicts  this  claim.  One 
final  unique  feature  of  our  security  mechanism  is  that  it  is  the  only  one  where 
the  refusal  of  some  requested  access  is  guaranteed  to  lead  to  a  saving  in  terms 
of  reduced  number  of  accesses  to  the  database  store. 

The  rest  of  this  dissertation  will  be  organized  as  follows.  In  Chapter  II, 
we  will  make  a  survey  of  typical  software-oriented  multiple  back-end  systems  in 
existence.  We  shall  point  out  the  strengths  and  weaknesses  of  these  systems 
and  make  some  recommendations  for  MDBS  so  that  the  weaknesses  pointed  out  for 
some  of  the  systems  can  be  avoided.  In  Chapter  III,  we  argue  that  the  attribute 
based  model  is  the  "superior"  or  "most  appropriate"  data  model.  Accordingly,  a 
simple  data  manipulation  language  based  on  this  attribute  model  is  chosen  and 
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presented  formally.  In  Chapter  IV,  we  elaborate  on  the  database  placement 
strategy  which  was  proposed  in  Chapter  II  and  which  was  shown  to  be  optimal. 
Furthermore,  algorithms  for  record  retrieval,  insertion,  deletion  and  update 
which  make  use  of  this  strategy  are  presented.  In  Chapter  V,  we  design  al¬ 
gorithms  for  concurrency  control  which  are  deadlock  free.  In  Chapter  VI,  we 
deal  with  security-related  issues  in  the  MDBS.  In  Chapter  VII,  we  present  a 
theoretical  treatment  of  the  optimal  interval  at  which  the  database  of  MDBS 
should  be  reorganized  in  order  to  take  the  advantage  of  the  database  placement 
strategy.  Also  presented  is  a  theoretical  treatment  of  the  optimal  interval 
at  which  additional  back-ends  may  need  to  be  added  to  the  system.  At  this 
point,  the  details  of  MDBS  will  have  been  completely  specified.  Chapter  VIII 
is  then  a  simulation  model  of  MDBS.  Results  showing  the  expected  performance 
of  MDBS  as  a  function  of  the  number  of  back-ends  are  presented  in  these  sec¬ 
tions.  In  Chapter  IX,  we  will  present  our  final  conclusions. 
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2.  A  SURVEY  OF  TYPICAL  SYSTEMS  AND  A  STUDY  OF  SYSTEM  ISSUES  FOR  DESIGN 
DECISIONS 

In  this  section,  we  shall  survey  some  of  the  existing  multiple  back-end 
systems.  Their  relative  merits  and  demerits  will  be  discussed.  As  we  had 
indicated  in  Chapter  1,  our  interest  is  in  software-oriented,  multiple  back¬ 
end  systems  with  a  controller.  We  shall, therefore,  include  neither  distri¬ 
buted  database  systems  such  as  SDD-1  [Roth80]  and  Distributed  INGRES  [Ston76a] 
nor  hardware-oriented  systems  such  as  DBC  [Bane78b,  Kann77b,  Kann78] .  Based 
on  the  findings  of  the  survey,  we  shall  make  some  design  decisions  for  MDBS. 
Obviously,  in  our  design  for  MDBS,  we  will  seek  to  avoid  the  weaknesses  of  the 
systems  surveyed. 

The  systems  that  we  shall  survey  in  this  section  are  RDBM  [Auer80],  DIRECT 
[Dewi78],  Stonebraker ' s  machine  [Ston78]  and  DBMAC  [Miss80] .  Even  though  many 
of  these  systems  bear  titles  which  include  the  word  'machine',  they  all  fall 
into  the  category  of  software-oriented,  multiple  back-end  systems.  These  four 
systems  do  not  form  a  comprehensive  list  of  all  systems  that  fall  into  this 
particular  category.  However,  we  feel  that  these  four  systems  are  typical  of 
existing  software-oriented,  multiple  back-end  systems. 

This  survey  is  an  important  and  integral  part  of  the  dissertation.  We  are 
not  presenting,  in  this  survey,  all  the  details  of  each  of  the  systems  surveyed. 
Rather,  for  each  system  surveyed,  certain  key  observations  are  made.  The  idea 
is  to  point  out  the  problems  of  these  systems  so  that  they  will  be  used  as  a 
lesson  for  better  designing  of  MDBS. 

2.1  A  Survey  of  Typical  Software-Oriented,  Multiple  Back-ends 

2.1.1  RDBM  -  A  Relational  Database  System 

The  RDBM  consists  of  three  major  components  as  shown  in  Figure  3 

(a)  a  mass  storage  device  with  its  own  storage  manager, 

(b)  a  multiprocessor  system  consisting  of  special-function  processors 
working  on  a  large,  common  main  memory,  and 

(c)  a  general-purpose  minicomputer  controlling  the  different  hardware 
components  and  performing  the  preprocessing  of  the  requests. 

We  shall  review  each  of  these  three  major  components  in  the  following. 

The  mass  storage  device  consists  of  conventional  secondary  memories,  extended 
by  a  block  buffer,  the  secondary  memory  manager  and  several  processing  elements, 
known  as  restriction  and  update  processors  (RUPs).  A  retrieve  request  is  exe- 
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Figure  3.  RDBM  -  A  Relational  Database  System 
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cuted  as  follows  in  RDBM.  The  pages  relevant  to  the  retrieve  request  are 
identified  and  read  from  the  secondary  memory  to  the  secondary  memory  buffer. 

The  records  from  the  buffer  are  then  sent,  one  by  one,  to  the  next  available 
RUP.  The  RUPs  examine  the  records  to  determine  if  they  satisfy  the  criteria 
of  the  retrieval  request.  The  RUPs  forward  the  final  set  of  records  to  the 
main  memory.  We  note  that,  at  any  one  time,  all  the  RUPs  are  executing  the 
same  instruction  (retrieval  request)  but  on  different  data  (records).  A  num¬ 
ber  of  observations  may  be  made  on  this  architecture. 

A.  The  Problem  of  Channel  Limitations 

First  of  all,  we  observe  that  some  of  the  relevant  pages  in  the  secondary 
memory  to  be  searched  on  are  already  in  the  main  memory.  This  is  because  the 
RUPs  can  only  examine  records  in  the  secondary  memory.  Secondly,  it  is  clear 
that  the  ultimate  limitation  in  throughput  is  the  rate  at  which  records  can  be 
read  from  the  secondary  memory  to  the  RUPs  via  the  interconnecting  channel. 

Thus,  after  a  certain  point,  the  use  of  additional  RUPs  will  not  improve  the 
rate  at  which  retrieve  (update,  delete)  requests  are  executed  by  the  RDBM. 

We  call  this  problem  the  channel  limitation  problem .  We  would  prefer  a  system 
which  is  not  limited  by  the  speed  of  the  channel.  In  other  words,  the  through¬ 
put  will  not  be  limited  by  interconnections  among  the  secondary  store,  the  pro¬ 
cessors  and  the  main  memory. 

B.  The  Problem  of  Software  Specialization 

The  multiprocessor  component  of  RDBM  consists  of  a  number  of  special- 
function  processors  -  one  to  perform  relational  joins,  one  to  perform  sorting 
of  retrieved  records,  etc.  In  such  a  system,  the  workload  distribution  among 
the  special-function  processors  could  be  uneven.  For  instance,  if  a  large  num¬ 
ber  of  user  requests  requires  joins  to  be  performed,  but  none  of  these  requests 
requires  any  sorting  to  be  performed,  it  is  clear  that  the  sort  processor  will 
be  under-utilized  whereas  the  join  processor  will  be  over-utilized.  Thus,  the 
best  utilization  of  the  multiple  processors  is  not  being  obtained.  Furthermore, 
such  a  system  may  be  unreliable.  For  example,  the  loss  of  the  join  processor 
will  render  the  RDBM  incapable  of  doing  joins.  This  is  known  as  the  software 
specialization  problem. 

A  system  which  can  continue  to  perform  all  the  database  management  functions 
(perhaps,  in  a  degraded  mode)  in  spite  of  the  loss  of  a  processor  is  to  be  pre- 
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ferred  over  a  system  where  the  loss  of  a  database  management  function  means 
a  permanent  denial  of  that  function  to  the  user.  Therefore,  to  overcome  this 
unreliable  operation,  a  system  should  not  have  special-function  software  in 
the  various  processors.  Rather,  it  should  have  general-purpose  software  in 
the  various  processors  so  that  all  of  them  are  capable  of  doing  all  the  data¬ 
base  management  functions  like  sorting,  joining,  etc.  In  this  case,  we  can 
also  expect  a  more  even  distribution  of  the  workload  among  the  processors. 

In  fact,  the  best  possible  solution  would  be  to  have  identical  software  in 
all  the  processors.  Then,  additional  processors  may  be  added  to  the  system 
with  the  greatest  ease  because  no  new  software  has  to  be  designed  for  the  ad¬ 
ditional  processors. 

C.  The  Problem  of  Controller  Limitation 

The  minicomputer  in  RDBM  controls  the  actions  of  the  various  hardware 
elements  in  processing  a  user  request.  Furthermore,  it  performs  the  prepro¬ 
cessing  and  analysis  of  the  user  request  to  determine  the  pages  in  the  secon¬ 
dary  memcry  to  be  retrieved,  deleted,  etc.  This  makes  the  speed  of  the  mini¬ 
computer  a  limiting  factor  to  the  throughput  of  the  RDBM.  To  explain  this 
point,  consider,  for  simplicity,  that  all  user  requests  require  10  seconds  of 
minicomputer  CPU  time  irrespective  of  the  number  of  back-ends.  This  means 
that  RDBM  cannot  support  a  throughput  rate  which  is  greater  than  six  requests 
per  minute,  irrespective  of  how  many  processors  it  uses  to  speed  up  operations 
like  join  and  sorting.  We  shall,  henceforth,  refer  to  this  problem  as  the 
controller  limitation  problem. 

This  problem  may  also  be  examined  from  the  viewpoint  of  the  response  time. 
The  ideal  goal  is  to  achieve  a  system  where  the  response  time  improves  in  pro¬ 
portion  to  the  number  of  back-ends  used  in  the  system.  Here,  the  response  time 
may  be  considered  as  the  sum  of  the  controller  execution  time  and  the  back-end 
execution  time.  Addition  of  more  back-ends  can  reduce  the  back-end  execution 
time,  but  it  cannot  reduce  the  controller  execution  time  which  is  a  constant 
independent  of  the  number  of  back-ends.  Thus,  the  controller  execution  time 
must  be  kept  to  a  minimum,  if  we  are  to  achieve  our  ideal  goal.  This  leads  us 
to  the  conclusion  that  all  major  tasks  must  be  performed  in  the  back-end  pro¬ 
cessors  in  a  parallel  fashion  and  that  the  controller  must  perform  minimal  work. 
So,  we  would  like  the  preprocessing  of  user  requests  to  be  performed  by  the 
multiple  back-ends  in  such  a  way  that  if  there  are  n  back-ends,  the  total  time 


for  preprocessing  is  speeded  up  by  a  factor  of  n 


D.  The  Problem  of  Data  Model  Limitation 

RDBM  supports  the  relational  model  of  data.  Thus,  in  order  to  support 
other  data  models  such  as  the  hierarchical  and  the  network  model,  it  will  be 
necessary  to  convert  the  hierarchical  and  network  database  to  a  relational 
one.  Furthermore,  requests  issued  in  a  hierarchical  and  network  data  manipu¬ 
lation  language  must  be  translated  to  requests  in  the  relational  data  manipu¬ 
lation  language.  Some  researchers  have  solved  the  translation  problem  parti¬ 
ally  by  translating  a  subset  of  the  network  model  into  the  relational  model 
[Katz80] .  However,  solutions  to  the  problem  of  translating  the  entire  network 
model  into  relational  model  are  not  at  hand.  Thus,  the  fact  that  RDBM  only 
supports  the  relational  model  constitutes  a  limitation  of  RDBM.  This  is  the 
data  model  limitation  problem.  We  would  prefer  to  have  a  data  model  that  is 
canonical.  By  a  canonical  model,  we  mean  that  the  model  allows  the  entirety 
of  any  prevailing  data  model  (i.e.,  relational,  hierarchical  and  network)  to 
be  translated  into  the  data  model. 

2.1.2  DIRECT  -  A  Multiple  Back-end  Relational  System 

As  a  relational  system  depicted  in  Figure  4,  DIRECT  [Dewi78]  consists  of 
four  main  components:  a  controller,  a  set  of  query  processors,  a  set  of  CCD 
memory  modules  and  an  interconnection  matrix  between  the  set  of  query  proces¬ 
sors  and  the  set  of  memory  modules.  When  the  controller  receives  a  user  re¬ 
quest,  it  will  determine  the  number  of  query  processors  which  should  be  assigned 
to  execute  the  request.  If  the  relations  referenced  by  the  request  are  not  in 
the  CCD  memories,  the  controller  will  page  them  in  before  distributing  the  re¬ 
quest  to  each  of  the  query  processors  selected  for  its  execution. 

A  retrieve  request  is  executed  as  follows.  First,  the  controller  deter¬ 
mines  the  optimal  number  of  query  processors  that  must  be  utilized  to  execute 
the  request.  This  depends  upon  the  size  of  the  relations  involved  in  the  re¬ 
quest,  the  priority  of  the  request  and  the  number  of  currently  available  pro¬ 
cessors.  The  controller  then  sends  the  request  to  each  selected  processor 
along  with  information  about  the  relations  to  be  referenced.  The  controller 
also  creates  a  task  which  waits  for  a  completion  signal  from  each  query  pro¬ 
cessor.  When  all  query  processors  have  signalled,  the  waiting  task  will  trans¬ 
mit  the  results  of  the  request  to  the  user.  For  example,  let  us  assume  that 
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only  a  single  relation  is  referenced  in  the  retrieve  request.  Then,  each  pro¬ 
cessor  will  request  the  controller  for  a  page  of  this  relation.  The  controller 
will  search  the  pages  of  the  relation  from  the  secondary  storage,  place  the 
pages  in  the  CCD  memory  and  pass  the  CCD  memory  address  of  the  requested  page 
to  the  processor.  The  query  processor  may  now  access  the  page  using  the  inter¬ 
connection  matrix  and  perform  the  necessary  work  on  the  page.  Finally,  the 
query  processor  creates  a  temporary  relation  containing  the  selected  tuples. 

This  temporary  relation  will  eventually  be  given  to  the  user. 

A.  The  Problems  of  Hardware  Specialization 

We  see  that  DIRECT  overcomes  the  channel  limitation  problem  by  use  of  the 
interconnection  matrix.  Strictly  speaking,  this  limitation  will  be  overcome 
only  if  each  query  processor  can  access  any  part  of  the  secondary  memory.  This 
is  not  possible  in  DIRECT,  since,  only  the  controller  can  access  the  secondary 
disk  memory.  The  use  of  the  CCD  memories  as  a  large  cache  for  the  secondary 
disk  memory  alleviates  the  channel  limitation  problem  to  a  large  extent.  Iron¬ 
ically,  the  biggest  drawback  of  DIRECT  is  its  need  to  use  an  interconnection 
matrix  whose  cost  increases  as  the  product  of  the  number  of  query  processors 
and  the  number  of  CCD  page  frames.  While  the  switching  delays  of  this  inter¬ 
connection  matrix  do  not  significantly  affect  the  time  to  access  a  page  of  CCD 
memory  for  small  number  of  processors  and  page  frames,  such  switching  delays 
may, nevertheless,  become  significant  in  a  full-scale  system  with  many  query 
processors  and  page  frames.  Another  problem  with  the  interconnection  matrix  is 
that  it  is  not  easily  extensible.  For  example,  the  addition  of  a  new  CCD  page 
frame  will  require  modifications  to  the  selector  interfaces  at  each  query  pro¬ 
cessor,  whereas  the  addition  of  a  new  query  processor  will  require  modifica¬ 
tions  to  the  interfaces  at  each  CCD  page  frame.  Finally,  the  interconnection 
matrix  is  not  a  conventional  hardware  element  because  it  must  be  specially  de¬ 
signed.  This  is  the  hardware  specialization  problem. 

B.  The  Problems  of  Control  Message  Traffic  and  Controller  Limitation 

Each  time  a  new  page  is  needed  by  a  query  processor,  a  message  must  be  sent 
to  the  controller  and  a  message  must  be  received  from  the  controller.  These 
two  messages  may  be  considered  as  an  overhead  for  the  task  of  reading  a  page 
from  the  CCD  memory.  It  has  been  estimated  that  about  8,000  instructions  are 
needed  [Bora81]  to  send  a  message  from  the  controller  to  a  query  processor  or 
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vice  versa.  Thus,  approximately  16,000  instructions  have  to  be  executed  before 
a  page  may  be  read  from  the  CCD  memory.  Assuming  that  an  instruction  takes  1 
psec  to  execute,  16  msecs  .of  overhead  are  associated  with  the  task  of  reading  a 
page  from  the  CCD  memory.  The  task  of  reading  a  page  from  the  CCD  memory  only 
takes  12  msecs  [Dewi78].  Thus,  the  overhead  associated  with  the  task  of  reading 
a  page  from  secondary  memory  is  of  57%.  The  above  calculation  does  not  include 
the  time  taken  by  the  controller  to  search  the  relation  tables  [Dewi78]  for  the 
purpose  of  determining  the  next  page  nor  does  it  include  the  queueing  delays 
suffered  by  the  two  overhead  messages.  Hence,  the  overhead  for  the  task  of 
reading  a  page  from  CCD  memory  is  likely  to  be  greater  than  the  estimated  fi¬ 
gure  of  57%. 

Also,  the  present  configuration  of  DIRECT  does  not  permit  broadcasting  of 
requests  to  the  query  processors  from  the  controller.  As  a  result,  a  request 
which  is  to  be  executed  by,  say,  three  query  processors  whould  require  three 
separate  messages  to  be  sent  from  the  controller  to  the  query  processors  and 
this  would  require  approximately  (8,000  x  3  =*)  24,000  instructions  and  take  up 
about  24  msecs  of  controller  time.  Thus,  there  is  the  problem  of  control 
message  traffic. 

Finally,  we  point  out  that  DIRECT  most  definitely  suffers  from  the  con¬ 
troller  limitation  problem.  That  is,  the  controller  is  actively  involved  in 
many  phases  of  the  execution  of  a  request  -  in  the  query  analysis  phase,  in 
the  concurrency  control  operations,  in  getting  the  next  page,  etc.  -  so  that 
the  throughput  of  DIRECT  will  be  limited  by  the  speed  of  the  controller  in  the 
execution  of  its  various  tasks. 

C.  The  Problem  of  Multiple  Request  Execution 

The  DIRECT  approach  does  not  allow  a  query  processor  to  support  concurrent 
execution  of  multiple  requests.  For  instance,  consider  that  while  executing 
a  retrieve  request,  a  query  processor  requests  a  page  which  is  not  in  the  CCD 
memory.  Then,  the  query  processor  is  idle  until  the  page  is  loaded  into  a 
page  frame  in  the  CCD  memory.  Such  idling  could  have  been  avoided,  had  the 
query  processors  been  allowed  to  concurrently  execute  multiple  requests.  If 
each  query  processor  were  allowed  to  concurrently  execute  multiple  requests, 
more  complex  software  would  be  required  in  the  query  processors.  This  is  the 
argument  used  by  the  designers  of  DIRECT  for  not  allowing  concurrent  execution 
of  multiple  requests  in  the  query  processors.  While  we  understand  the  ratio- 


nale  behind  this  argument,  we  feel  that  that  was  not  an  alternative.  In  a 
multi-user  system,  the  response  time  is  of  the  utmost  importance.  By 
allowing  concurrent  execution  in  the  query  processors,  we  can  improve  the  re¬ 
sponse  time.  Furthermore,  a  large  fraction  of  the  requests  is  likely  to  be 
I/O  bound,  and  the  use  of  concurrent  request  execution  will  serve  to  increase 
the  utilization  of  the  query  processors  (i.e.,  the  back-ends).  Even  if  much 
of  this  anticipated  utilization  is  spent  in  overhead  activities  such  as  task 
switching,  concurrent  request  execution  is  likely  to  provide  a  performance  im¬ 
provement  . 

In  a  system  like  DIRECT,  concurrency  control  is  necessary  in  spite  of  the 
fact  that  individual  query  processors  do  not  support  concurrent  request  execu¬ 
tion.  This  is  because  all  the  query  processors  are  allowed  to  access  all  the 
pages  of  the  CCD  memory.  Hence,  two  query  processors  may  try  to  update  the 
same  page  unless  concurrency  control  is  enforced.  It  will  be  shown  that  con¬ 
currency  control  is  unnecessary,  if  the  back-ends  do  not  support  concurrent 
request  execution.  Concurrency  control  in  DIRECT  is  maintained  by  means  of 
lock  tables  residing  in  the  controller  and  requires  a  number  of  messages  to  be 
exchanged  between  the  controller  and  the  query  processors.  Furthermore,  lock¬ 
ing  is  done  at  the  granularity  of  a  relation  and  this  may  reduce  the  degree  of 
concurrency  achievable  in  DIRECT.  As  long  as  no  indices  are  maintained  and 
each  request  requires  the  retrieval  of  entire  relations  (as  in  the  case  of 
DIRECT),  it  is  difficult  to  have  a  finer  locking  granularity.  This  is  because  the 
two-phase  lock  protocol  [Eswa76]  requires  that  no  lock  be  released  until  the 
end  of  a  transaction.  Thus,  the  lock  on  the  first  page  of  a  relation  cannot 
be  released  until  the  lock  on  the  last  page  of  that  relation  is  set.  On  the 
other  hand,  a  non-two-phase  lock  protocol  may  have  to  be  used,  if  a  better 
locking  granularity  is  to  be  achieved. 

Another  drawback  of  DIRECT  is  that  after  a  query  processor  completes  exe¬ 
cuting  a  request,  it  cannot  immediately  start  executing  the  next  request.  This 
is  because  no  processor  maintains  a  queue  of  waiting  requests.  Only  the  con¬ 
troller  maintains  such  a  queue.  Consequently,  a  query  processor  must  first  wait 
until  the  results  of  the  previous  request  are  received  at  the  controller  and 
then  wait  till  the  controller  has  shipped  the  next  request  to  the  query  pro¬ 
cessor.  TVo  messages  of  16  msecs  are  required  between  the  end  of  execution 
of  one  request  and  the  start  of  execution  of  the  next.  We  would  prefer  to  have 
a  queue  of  waiting  requests  at  each  back-end.  While  such  a  strategy  is  not  ex- 
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pected  to  increase  the  throughput  or  decrease  the  response  time  dramatically, 
it  will  certainly  be  an  improvement  over  the  present  strategy  of  DIRECT.  Be¬ 
sides  the  savings  of  sixteen  msecs  we  had  mentioned,  the  overhead  of  main¬ 
taining  queues  has  been  removed  from  the  controller  and  passed  on  to  the  back¬ 
ends.  This  should  contribute  to  an  alleviation  of  the  controller  limitation 
as  well.  The  aforementioned  difficulties  characterize  the  problem  of  multiple 
request  execution. 

D.  The  Problem  of  Data  Model  Limitation 

Another  criticism  of  DIRECT  is  the  fact  that  it  supports  only  the  rela¬ 
tional  model  of  data.  As  has  been  previously  pointed  out,  more  research  needs 
to  be  done  before  hierarchical  and  network  models  may  be  supported  by  such  a 
system. 

In  DIRECT,  entire  relations  must  be  retrieved  in  order  to  answer  queries. 

On  the  other  hand,  a  system  which  uses  index  information  (like  inverted  lists 
on  selected  attributes)  will  be  able  to  retrieve  relevant  portions  of  relations 
and  save  valuable  secondary  storage  access  time. 

2.1.3  Stonebraker ' s  Machine  -  A  Distributed  Database  System 

A  schematic  of  this  system  is  shown  in  Figure  5.  It  consists  of  a  single 
controller  and  multiple  back-ends.  Each  back-end  is  connected  to  a  single  disk 
drive  [Ston78] .  The  controller  preprocesses  the  user  queries  and  performs  parsing 
and  decomposition  of  user  queries  into  requests  that  access  only  a  single  rela¬ 
tion.  Directory  information  is  stored  in  one  of  the  becl'-ends  which  is  desig¬ 
nated  as  the  special  back-end.  After  decomposing  a  query,  the  controller  acces¬ 
ses  the  directory  in  this  special  back-end  to  determine  the  back-ends  which  must 
be  utilized  to  execute  the  requests  and  then  sends  the  requests  to  these  back¬ 
ends.  After  the  back-ends  return  the  results  of  the  query  to  the  controller, 
the  controller  outputs  the  results  to  the  user  that  issued  the  query. 

A.  The  Problem  of  the  Back-end  Limitation 

The  first  thing  we  note  about  the  distributed  database  system  is  that  the 
channel  limitation  problem  does  not  exist.  This  is  because  the  multiple  back¬ 
ends  may  read  data  from  the  secondary  storage  simultaneously  via  the  multiple 
channels.  However,  unlike  DIRECT,  a  request  must  be  executed  by  one  or  more 
specific  back-ends.  For  instance,  if  a  request  requires  retrieval  of  informa¬ 
tion  stored  in  the  disk  drive  attached  to  the  first  back-end,  only  this  back- 


J 


hi;. 


-  27  - 


end  may  be  employed  In  order  to  execute  this  request.  Hence,  consider  the  fol¬ 
lowing  situation.  Two  requests  are  issued  one  after  another;  both  require  access 
to  a  relation  stored  entirely  at  the  disk  drive  attached  to  the  first  back-end. 
Then,  the  second  of  these  requests  must  wait  until  the  first  request  completes 
execution  even  though  many  of  the  other  back-ends  in  the  system  are  idle.  This 
characterizes  the  back-end  limitation  problem. 

B.  The  Problem  of  the  Specialized  Back-end 

The  placement  of  the  directory  at  a  specific  back-end  seems  quite  unwarrented. 
Such  a  scheme  requires  that,  for  every  request,  messages  must  be  sent  to  and  re¬ 
ceived  from  the  special  back-end  which  carries  the  directory.  The  problem  of 
specialized  back-end  idles  the  other  back-ends  of  the  system. 

C.  The  Problem  of  Controller  Limitation 

It  should  also  be  pointed  out  that  this  system  suffers  from  the  controller 
limitation  problem,  since  parsing  and  decomposition  of  queries  is  done  entirely 
at  the  controller  and  will  take  an  amount  of  time  which  is  independent  of  the 
number  of  back-ends  in  the  system. 

D.  The  Problem  of  Multiple  Request  Execution 

Another  disadvantage  of  this  system  is  that  the  back-ends  do  not  support 
concurrent  request  execution.  As  a  result,  overlap  of  I/O  time  with  proces¬ 
sing  of  other  requests  is  not  possible  and  back-ends  may  be  idle  for  large 
amounts  of  time  waiting  for  an  I/O  to  complete.  Consequently,  the  best  use  of 
processing  power  is  not  being  made. 

E.  The  Problem  of  Device  Limitation 

In  this  system,  each  back-end  is  only  connected  to  a  single  disk  drive. 

As  a  result,  very  large  databases  of  the  magnitude,  say,  of  101 0  bytes,  cannot 
be  supported,  since  that  would  require  hundreds  of  back-ends  and  would  make  the 
system  extremely  expensive.  It  would  seem  more  reasonable  to  allow  multiple 
disk  drives  to  be  attached  to  each  back-end.  This  is  the  device  limitation 
problem. 

F.  The  Problem  of  Control  Message  Traffic 

The  present  configuration  of  the  distributed  database  system  does  not 
allow  for  broadcasting  of  requests  to  the  back-ends.  Hence,  a  request  which 
requires  cooperation  among,  say,  three  back-ends  would  require  three  separate 
messages  from  the  controller,  i.e.,  (8,000x3=)  24,000  instrucitons  and  24  msecs 
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of  controller’s  CPU  time.  The  ability  to  broadcast  the  requests  to  the  back¬ 
ends  would  save  valuable  CPU  time. 

G.  The  Problem  of  Data  Model  Limitation 

Finally,  a  weakness  of  this  system  is  that  it  is  only  designed  to  support 
the  relational  model  of  data.  Furthermore,  as  in  DIRECT,  the  entire  relation 
has  to  be  retrieved  in  order  to  answer  an  access  request  for  a  portion  of  a 
relation.  As  a  result,  a  larger  amount  of  information  than  is  necessary  is 
retrieved  from  the  secondary  memory. 

2.1.4  DBMAC  -  An  Italian  Database  System 

A  view  of  this  system  is  shown  in  Figure  6.  Since  information  about 
this  system  was  obtained  through  private  communication  [Miss80] ,  the  details  are 
necessarily  sketchy.  The  overall  architecture  consists  of  a  controller  con¬ 
nected  to  a  set  of  back-end  computers  over  a  mass  bus.  The  set  of  back-end 
computers  is  also  connected  to  a  set  of  secondary  storage  devices  via  another 
mass  bus.  There  is  no  one-to-one  correspondence  between  the  back-ends  and  the 
secondary  storage  devices.  Each  system  task  (like  the  reqi  ist  preprocessing 
task,  the  concurrency  control  task,  etc.)  is  performed  by  a  set  of  modules. 

The  set  of  modules  for  a  system  task  are  placed  in  such  a  way  that,  as  far  as 
possible,  no  two  modules  of  a  task  are  placed  at  the  same  back-end.  Thus,  all 
the  system  tasks  are  executed  by  the  back-ends  in  a  distributed  fashion.  Each 
back-end  has  a  local  primary  memory  and  a  shared  primary  memory.  Communica¬ 
tions  among  the  back-end  processors  are  by  means  of  message  passing  over  the 
first  mass  bus. 

A.  The  Problem  of  Channel  Limitation 

The  first  observation  of  the  system  that  we  wish  to  make  is  that  its 
throughput  is  limited  by  the  speed  of  the  second  mass  bus  attached  to  the  se¬ 
condary  memory.  This  is  because  even  though  there  are  a  number  of  back-ends, 
they  cannot  access  different  portions  of  the  secondary  memory  simultaneously. 

In  other  words,  the  system  suffers  from  the  channel  limitation  problem.  The 
throughput  will  also  be  limited  by  the  speed  of  the  mass  bus  connected  to  the 
controller.  In  fact,  this  mass  bus  will  be  heavily  utilized,  since  all  the 
system  tasks  have  been  broken  up  into  modules  that  communicate  with  each  other 
via  this  mass  bus. 


w 


_  A _ 


B.  The  Problem  of  Software  Specialization 

Since  each  back-end  contains  a  separate  module  from  each  system  task,  it 
is  clear  that  the  software  in  the  different  back-ends  is  not  identical.  This 
leads  to  a  decrease  in  system  reliability,  because  the  loss  of  one  of  the  pro¬ 
cessors  will  render  the  system  incapable  of  performing  any  database  management 
function.  Furthermore,  such  a  system  is  not  easily  extensible.  The  addition 
of  a  new  back-end  will  require  the  redistribution  of  the  modules  among  the 
back-ends  which  may  be  a  time-consuming  and  non-trivial  task. 

C.  The  Problem  of  Back-end  Limitation 

Unlike  the  aforementioned  distributed  database  system,  any  of  the  back¬ 
ends  of  DBMAC  may  be  selected  to  perform  any  of  the  requests,  since  all  back¬ 
ends  have  access  to  the  entire  database.  Consider  the  following  example.  Let 
us  assume  that  each  back-end  has  enough  primary  memory  to  store  100  tracks  of 
information.  Also,  let  us  assume  that  there  are  n  back-ends  and  n  disk  drives 
and  that  each  drive  contains  1000  tracks.  Then,  the  probability  that  a  track 
needed  to  answer  a  request  is  in  primary  memory  is  (100/1, OOOn  *)  l/10n  (assuming 
that  there  is  an  equal  probability  of  accessing  any  track).  In  an  organization 
such  as  the  distributed  database  system  of  Stonebraker,  however,  the  probability 
that  a  track  requested  by  a  back-end  is  in  the  primary  memory  is  1/10.  This 
would  cause  the  back-ends  in  DBMAC  to  make  many  more  accesses  to  secondary  me¬ 
mory  than  the  back-ends  in  the  distributed  database  system  of  Stonebraker. 

D.  The  Problem  of  Data  Model  Limitation 

As  a  final  comment,  DBMAC  supports  only  the  relational  model  of  data. 


In  this  section,  we  will  present  the  overall  architecture  of  a  multiple 
back-end  database  system,  known  as  MDBS.  We  will  provide  the  arguments  which 
lead  us  to  this  particular  architecture.  More  specifically,  we  will  present  the 
step-by-step  development  of  the  architecture.  Whenever  it  is  necessary  to  make 
a  design  decision,  we  shall  use  simulation  studies  and  analytic  techniques  to 
examine  the  alternatives. 

2.2.1  Nine  Design  Goals 

In  terms  of  our  survey  of  typical  database  systems  presented  in  the  previous 
section,  we  set  nine  design  goals  for  MDBS.  First,  the  channel  limitation  pro¬ 
blem  should  not  be  present  in  MDBS  in  the  first  place.  Second,  the  controller 
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limitation  problem  must  be  alleviated  to  as  large  an  extent  as  possible.  Third, 
the  back-ends  must  execute  identical  software.  That  is,  all  of  them  must  be 
utilized  to  perform  all  the  database  management  functions.  The  problems  of 
software  specialization  and  back-end  specialization  can  then  be  eliminated. 

This  leads  to  increased  reliability  and  to  a  better  workload  distribution  as  has 
been  discussed.  It  also  leads  to  the  simplified  addition  of  more  back-ends. 

Fourth,  communications  among  the  back-ends  and  between  the  back-end  and  the  con¬ 
troller  must  be  kept  to  a  minimum.  Without  excessive  communications  the  through¬ 
put  of  MDBS  will  not  taper  off  after  the  first  few  additional  back-ends.  Conse¬ 
quently,  the  problem  of  control  message  traffic  will  not  exist.  Fifth,  we  re¬ 
solve  not  to  use  any  special-purpose  hardware  in  MDBS.  As  a  result,  the  pro¬ 
blem  of  hardware  specialization  will  not  exist.  Sixth,  we  propose  to  support 
concurrent  request  execution  in  our  back-ends  in  order  to  eliminate  the  problem 
of  multiple  request  execution.  For  our  seventh  goal,  we  resolve  to  overcome 
the  device  limitation  problem  by  attaching  more  than  one  disk  drive  per  back-end. 
Eighth,  we  will  design  MDBS  in  such  a  way  that  all  the  back-ends  will  participate 
in  the  execution  of  a  request.  As  a  result,  we  will  have  eliminated  the  back-end 
limitation  problem.  Finally,  our  ninth  goal,  we  resolve  to  overcome  the  problem 
of  data  model  limitation.  In  other  words,  we  will  propose  a  canonical  data  mo¬ 
del  into  which  all  prevailing  data  models  (such  as  relational,  hierarchical  and 
network)  can  be  fully  translated.  If  these  nine  design  goals  are  attained,  we 
believe  that  MDBS  may  come  close  to  being  an  Ideal  system  whose  reliability, 
performance  (i.e.,  the  throughput  and  the  response  time)  and  growth  will  be  pro¬ 
portional  to  the  number  of  back-ends  employed. 

2.2.2  Towards  an  Ideal  System  Architecture 

In  the  following  section,  we  will  show  how  we  prevent  the  channel  limita¬ 
tion  problem  from  occurring  in  MDBS.  The  proposed  solution  utilizes  the  tech¬ 
nique  used  in  the  distributed  database  system  of  Stonebraker.  By  superimposing 
a  data  placement  strategy  on  top  of  that  technique,  we  eliminate  the  back-end 
limitation  problem  in  MDBS  as  well.  This  strategy  is  only  briefly  explained  in 
Section  2.3,  since  it  is  expounded  in  great  detail  in  Chapter  IV.  Furthermore, 
the  data  placement  strategy  combined  with  the  use  of  a  broadcast  capability  is 
shown  to  prevent  the  occurrence  of  the  control  message  traffic  problem  in  MDBS. 

Next,  the  device  limitation  problem  is  eliminated  in  MDBS  by  attaching  multiple 
disk  drives  to  each  back-end.  In  the  final  section  of  this  Chapter,  we  will  show 
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that  the  problem  of  hardware  specialization  does  not  exist  In  MDBS  either. 

Thus,  five  of  the  nine  design  goals  mentioned  above  are  achieved  in  this 
Chapter. 

In  Chapter  III,  we  propose  a  canonical  data  model  into  which  all  pre¬ 
vailing  data  models  and  their  data  manipulation  languages  (such  as  relation¬ 
al,  hierarchical  and  network)  can  be  translated.  Thus,  we  eliminate  the  data 
model  limitation  problem  in  MDBS.  The  software  to  be  executed  at  each  back¬ 
end  is  described  in  Chapter  IV.  From  the  discussion  in  Chapter  IV,  it  will  be 
seen  that  the  problems  of  software  specialization  and  back-end  speciali¬ 
zation  can  be  eliminated  from  MDBS  and  the  goal  of  using  identical  software  can  be 
achieved.  In  order  to  alleviate  the  controller  limitation  problem,  the  direc¬ 
tory  management,  concurrency  control  and  secutity  enforcement  algorithms  are 
carefully  designed.  How  the  careful  design  of  these  three  algorithms  serves  to 
alleviate  the  controller  limitation  problem  is  explained  in  Chapters  IV,  V  and 
VI,  respectively.  The  discussion  of  Chapter  V  will  center  on  how  we  choose  to 
eliminate  the  problem  of  multiple  request  execution  in  MDBS. 

The  remainder  of  this  chapter  is  organized  as  follows.  In  Section  2.4, 
we  will  design  a  simulation  experiment  which  will  illustrate  the  importance  of 
having  a  broadcast  capability  in  MDBS.  Finally,  an  overview  of  the  basic  de¬ 
sign  and  architecture  of  MDBS  is  presented  in  Section  2.5. 


S 


2.3  First  Design  Decision  -  Eliminating  the  Channel,  Back-end  and 

Device  Limitation  Problems 

The  only  architectural  decision  we  have  made  with  regard  to  MDBS  up  to 
this  point  is  that  it  consists  of  a  controller  and  a  number  of  back-ends. 

More  decisions  regarding  the  MDBS  architecture  will  be  made  as  we  proceed. 

Let  us  now  try  to  design  MDBS  in  such  a  way  as  to  eliminate  any  occur¬ 
rence  of  the  channel  limitation  problem.  It  was  shown  that  RDBM  and  DBMAC 
both  suffered  from  this  problem,  whereas,  the  distributed  database  system  of 
Stonebraker  and  DIRECT  did  not.  DIRECT  overcame  the  problem  by  use  of  an 
interconnection  matrix  which  allowed  any  query  processor  to  access  any  CCD  page 
frame.  The  distributed  database  system  of  Stonebraker  overcame  the  problem  by 
using  a  separate  disk  drive  associated  with  each  back-end.  Thus,  the  technique 
developed  for  DIRECT  and  the  one  developed  for  the  distributed  database  machine 
are  good  candidates  for  our  consideration. 

The  technique  developed  for  the  distributed  database  system  may  be  at¬ 
tractive  owing  to  its  extreme  simplicity  and  low  cost.  Another  strong  point 


4 

♦- 


■ 


-33  - 


of  this  technique  is  that  the  concurrency  control  problem  is  alleviated  be¬ 
cause  two  different  back-ends  will  not  have  the  same  data  item  for  update  due 
primarily  to  the  use  of  dedicated  disk  drives.  In  spite  of  these  advantages, 
the  technique  may  be  unattractive  to  us  because  it  suffers  from  the  problem 
that  a  given  request  can  only  be  executed  at  the  back-end  attached  to  the  disk 
drive  on  which  the  data  relevant  to  the  request  resides.  This  is  what  we  have 
called  the  back-end  limitation  problem.  If  the  data  relevant  to  a  request  re¬ 
sides  on  a  single  disk  drive,  only  a  single  back-end  can  be  used  to  execute 
that  request.  In  other  words,  the  parallelism  that  is  present  in  the  system 
is  not  being  utilized  to  execute  the  request.  We  would  prefer  a  technique 
which  allows  all  the  back-ends  to  participate  in  the  execution  of  a  given  request 

DIRECT  is  a  system  which  allows  all  the  back-ends  to  participate  in  the 
execution  of  request.  They  achieve  this  by  bringing  the  data  relevant  to  the 
request  into  a  CCD  memory  which  is  accessible  from  all  back-ends.  However, 
such  a  technique  is  expensive,  since  it  requires  an  interconnections  matrix 
whose  cost  grows  as  the  product  of  the  number  of  back-ends  and  the  number  of 
CCD  memory  modules. 

We  therefore  wish  to  develop  a  technique  which  allows  all  the  back-ends 
to  participate  In  the  execution  of  a  request  on  the  one  hand  and  forgoes  the 
costly  interconnection  matrix  on  the  other  hand.  Our  solution  is  to  have  a 
system  with  dedicated  secondary  memories  and  to  store  the  data  in  such  a  way 
that,  whatever  the  request  is,  the  data  to  be  retrieved  for  satisfying  the 
request  is  evenly  distributed  among  the  back-ends.  Such  a  data  placement  stra¬ 
tegy  allows  all  the  back-ends  to  participate  in  the  execution  of  request,  by 
reading  data  from  the  secondary  memory  simultaneously  via  the  multiple  channels. 
Thus,  the  lack  of  parallelism  due  to  dedicated  devices  as  exemplified  in  the 
distributed  database  system  of  Stonebraker  does  not  occur  in  MDBS.  In  other 
words,  we  have  taken  the  technique  of  the  distributed  database  system  of 
Stonebraker  for  overcoming  the  channel  limitation  problem  and  superimposed  on 
it  a  placement  strategy  which  serves  to  avoid  the  back-end  limitation  problem. 

An  overview  of  MDBS  architecture  is  depicted  in  Figure  7.  Note  that  each  back¬ 
end  is  attached  to  multiple  disk  drives  for  the  elimination  of  the  device  li¬ 
mitation  problem. 

2.3.1  The  Need  for  a  Data  Placement  Strategy 

Let  us  illustrate  the  strategy  first  with  an  example.  Consider  a  file 
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with  six  records  as  shown  in  Figure  8.  In  the  figure,  an  MDBS  with  one  con¬ 
troller  and  two  back-ends  is  depicted.  In  order  to  keep  the  example  deliber¬ 
ately  simple,  we  make  the  following  simplifying  assumptions: 

(1)  Each  back-end  has  only  a  single  disk  drive. 

(2)  Each  disk  has  only  three  tracks. 

(3)  All  the  records  are  of  fixed-length  and  occupy  exactly  one  track. 

(4)  Each  record  contains  exactly  three  keywords. 

Consider  the  arbitrary  placement  of  recrods  in  Figure  8a,  and  assume  that 
MDBS  receives  the  following  retrieve  request. 

"Retrieve  all  records  which  satisfy  the  query  conjunction  (K1&K3)." 

The  query  will  be  forwarded  by  the  controller  to  the  two  back-ends.  Given 
the  record  placement  of  Figure  8a,  we  see  that  the  first  back-end  must  access 
two  tracks,  since  its  disk  contains  two  records  -  one  with  keywords  Kl,  K2  and 
K3,  and  one  with  keywords  Kl,  K3  and  K4  -  which  satisfy  the  conjunction  (K1&K3) . 
The  second  back-end,  on  the  other  hand,  does  not  need  to  access  its  secondary 
memory  at  all,  since  it  contains  no  records  which  will  satisfy  the  given  query 
conjunction.  Thus,  if  we  let  td  designate  the  time  to  access  and  read-out  a 
track  (i.e.,  seek  time  and  rotation  time)  and  we  ignore  the  directory  search 
time  and  the  time  taken  by  the  controller  to  broadcast  the  request,  then  the 
response  time  of  the  request  t^  is  equal  to  2td .  Can  we  do  better  than  this? 

In  the  previous  example,  the  poor  response  time  was  caused  by  an  uneven 
distribution  of  the  data  among  the  two  back-ends.  One  distribution  of  data 
which  leads  to  a  better  response  time  is  the  one  shown  in  Figure  8b.  The 
query  response  time,  in  this  case,  is  equal  to  td.  This  is  because  each  back¬ 
end  has  only  one  record  satisfying  the  given  query  conjunction  and  needs  to 
access  only  one  track.  Also,  the  two  back-ends  can  access  their  respective 
disks  simultaneously.  It  Is  this  type  of  parallel  operation,  combined  with 
the  even  workload  (i.e.,  data)  distribution  that  contributes  to  the  optimal 
response  time  tq. 

We  would  like  to  note,  however,  that  the  arrangement  of  records  as  shown 
in  Figure  8b  may  cause  the  response  time  of  some  other  query  (e.g.,  "retrieve 
all  records  which  satisfy  the  conjunction  (K1&K5)")  to  become  worse.  Thus, 
the  example  has  demonstrated  two  things  to  us.  First,  a  careful  data  place¬ 
ment  strategy  must  be  adopted  to  obtain  improved  response  time  over  that  which 
would  be  obtained  with  an  arbitrary  data  placement  strategy.  Secondly,  the 
strategy  must  be  good  for  all  the  types  of  requests  which  will  be  issued  against 
the  database. 
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Figure  8b.  An  Improved  Data  Placement  of  Records 
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2.3.2  An  Evaluation  of  Data  Placement  Strategies 

Three  data  placement  strategies  are  outlined  and  evaluated  in  this  section. 
These  are  the  exact  division  strategy,  the  track  splitting  with  placement  from  the 
first  back-end  strategy,  and  the  track  splitting  with  random  placement  strategy. 
The  differences  among  these  three  placement  strategies,  referred  to  simply  as 
Strategy  A,  Strategy  B  and  Strategy  C,  respectively,  are  shown  by  means  of  an 
example  developed  in  Figures  9.  In  Strategy  A,  the  records  in  the  response 
set  were  originally  divided  exactly  among  the  disk  drives  of  the  back-ends. 

Thus,  if  the  response  set  of  a  request  consists  of  five  tracks  of  data  and 
there  are  three  back-ends  as  depicted  in  Figure  9a,  each  back-end  will  contain 
(5/3=0  1.67  tracks  of  data.  Strategy  B  also  consists  of  dividing  the  records 
of  a  response  set  equally  among  the  back-ends.  However,  it  is  different  from 
Strategy  A  in  that  the  division  of  the  response  set  takes  place  at  tract  bound¬ 
aries.  Thus,  if  the  response  set  of  a  request  consists  of  five  tracks  of  re¬ 
cords  and  there  are  three  back-ends,  the  disk  drive  of  each  back-end  will  con¬ 
tain  one  track  of  data.  The  remaining  two  tracks  are  then  assigned  to  disk 
drives  of  the  first  two  back-ends.  In  other  words,  the  disk  drive  of  the  first 
back-end  contains  two  tracks  of  the  response  set,  the  disk  drive  of  the  second 
back-end  contains  two  tracks  of  the  response  set  and  the  disk  drive  of  the 
third  back-end  contains  only  one  track  of  the  response  set  as  illustrated  in 
Figure  9b.  Strategy  C  is  similar  to  Strategy  B  in  that  the  division  of  the 
response  set  takes  place  at  track  boundaries.  It  differs  from  Strategy  B  in 
that  the  left-over  data  after  exact  division  of  the  data  in  terms  of  tracks 
among  the  back-ends  are  assigned  to  the  disk  drive  of  arbitrary  back-ends. 

Thus,  in  the  example  where  the  response  set  of  a  request  consists  of  five 
tracks  of  records  and  there  are  three  back-ends,  the  disk  drive  of  each  back¬ 
end  will  contain  initially  one  track  of  data.  The  remaining  two  tracks  are 
then  assigned  to  the  disk  drives  of  the  two  back-ends  picked  randomly  and  not 
necessarily  to  the  disk  drives  of  the  first  two  back-ends  as  in  Strategy  B. 

The  situation  is  depicted  in  Figure  9c. 

Four  simulation  models  of  MDBS  are  developed  using  SIMULA  on  a  DEC  System 
20.  First,  we  test  out  the  simulation  model  in  which  there  is  no  data  place¬ 
ment  policy.  That  is,  no  assumption  is  made  regarding  where  the  response  set 
of  a  request  is  located.  One  or  more  back-ends  may  be  employed  in  the  execu¬ 
tion  of  a  request  depending  upon  where  the  response  set  to  the  request  is 
stored.  Parallelism  will  be  utilized  only  if  more  than  one  back-end  needs  to 
be  employed.  Such  a  simulation  model  would  approximate  a  system  like  the  dis¬ 
tributed  database  system  of  Stonebraker  [Ston78],  where  no  special  placement 
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Figure  9.  Three  Different  Data  Placement  Strategies 
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policy  is  employed.  In  the  simulation  model,  a  random  number  generator  is 
used  to  determine  how  many  back-ends  will  participate  in  answering  the  re¬ 
quest  and  the  request  is  then  sent  to  these  many  back-ends.  The  remaining 
three  simulation  models  simulate  MDBS  under  the  three  aforementioned  place¬ 
ment  strategies. 

For  all  four  simulation  models,  each  back-end  has  a  queue  of  requests 
which  are  executed  in  a  first-in-and-fir st-out  basis.  Also,  the  time  to 
broadcast  a  request  and  the  time  to  return  the  results  to  the  controller  are 
ignored,  since  we  are  not  interested  in  modelling  message  traffic  at  this 
point  (that  will  be  done  later).  Furthermore,  at  a  back-end,  the  time  taken 
to  retrieve  records  from  the  secondary  memory  is  assumed  to  be  dominating  the 
execution  time  of  a  request.  Thus,  CPU  execution  time  is  ignored.  Finally, 
in  all  four  models  we  make  the  assumption  that  a  request  is  never  satisfied 
by  the  data  that  is  already  in  the  memory  buffers  of  the  bacic-end.  The  re¬ 
quest  always  requires  the  data  to  be  retrieved  from  the  secondary  memory. 

While  such  an  assumption  implies  a  worse  case  situation,  it  has  the  advantage 
that  factors  such  as  good  buffering  techniques  will  not  affect  our  results. 

The  results  of  our  simulation  studies  are  tabulated  in  Tables  I  and  II. 

It  is  assumed  that  25%  of  the  requests  generated  are  'insert  record'  requests 
and  the  remaining  are  delete,  update  and  retrieve  requests.  Each  of  the  three 
latter  types  of  requests  require  retrieval  of  information  from  disks.  It  is 
also  assumed  that  anywhere  between  five  and  twenty  tracks  of  information  will 
have  to  be  retrieved  and  searched  by  all  the  back-ends  in  order  to  answer  a 
retrieve,  delete  or  update  request.  If  more  than  one  track  has  to  be  retrieved 
by  a  back-end,  no  assumption  is  made  that  these  tracks  in  the  secondary  memory 
are  sequentially  next  to  each  other.  For  example,  consider  an  MDBS  system  with 
three  back-ends  and  assume  that  the  response  set  of  a  request  consists  of  six 
tracks  of  data.  Then,  the  data  placement  strategy  will  ensure  that  each  back¬ 
end  stores  two  of  the  six  tracks  of  the  response  set.  However,  the  two  tracks 
of  the  response  set  that  are  placed  at  a  back-end  are  assumed  to  be  randomly 
stored  on  its  disks.  In  Section  2.3.4,  we  will  present  another  simulation 
study  in  which  we  will  assume  that  if  more  than  one  track  has  to  be  retrieved 
by  a  back-end,  then  these  tracks  are  sequentially  placed  on  the  disk  tracks  of 
that  back-end.  Finally,  the  requests  are  assumed  to  arrive  in  a  Poisson  stream. 
This  assumption  implies  that  each  request  is  independent  of  all  others.  We 
have  also  simulated  the  systems  assuming  different  arrival  patterns,  and  the 
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results  do  not  differ  significantly  from  the  ones  with  Poisson  arrivals. 
Similarly,  simulations  have  been  run  with  different  percentages  of  insert  re¬ 
quests.  Again,  they  do  not  seem  to  affect  the  results  significantly  and  their 
results  are  therefore  not  presented  herein.  As  a  final  note,  the  method  of  sub¬ 
runs  [Fran77]  was  used  to  make  sure  that  the  results  were  unbiased  -  that  is, 
to  take  care  of  correlated  observations  in  the  simulation. 

The  actual  response  times  of  MDBS  under  the  various  placement  strategies 
are  shown  in  Table  I  for  various  request  interarrival  times  and  for  various 
number  of  back-ends.  The  first  observation  we  make  from  the  results  is  that 
the  performance  of  MDBS  can  be  improved  by  the  use  of  a  placement  strategy. 

Among  the  various  placement  strategies.  Strategy  C  is  the  best  and  Strategy  A 
is  the  worst  and  this  may  be  explained  intuitively  as  follows.  Consider  a 
system  with  three  back-ends  and  assume  that  a  particular  request's  response 
set  consists  of  195  records.  Furthermore,  let  us  assume  that  32  records  will 
fit  into  a  single  track.  In  all  the  three  strategies,  the  disk  drive  of  each 
back-end  would  have  stored  two  tracks  of  records  from  this  response  set,  making 
a  total  of  192  stored  records.  The  remaining  three  records  would  have  been 
stored  in  the  disk  drives  of  the  three  back-ends  on  the  basis  of  the  placement 
strategy  used.  Strategy  A  ensures  that  the  disk  drives  of  each  of  the  back¬ 
ends  will  contain  exactly  one  of  these  three  left-over  records.  Strategies  B 
and  C,  on  the  other  hand,  ensure  that  all  the  three  left  over  records  are 

placed  in  a  single  track  of  the  disk  drive  of  one  of  the  back-ends.  Strategy  A 

is  no  better  than  Strategies  B  and  C  because  the  time  to  retrieve  one  record 
from  the  secondary  memory  is  almost  the  same  as  the  time  to  retrieve  three  re¬ 
cords  from  the  secondary  memory.  This  is  because  the  minimal  disk  access  time 
is  the  time  to  access  a  track.  Let  us  denote  the  time  to  retrieve  an  entire 
track  of  records  from  the  secondary  memory  as  x.  Then,  in  Strategy  A,  each 
back-end  will  spend  3x  time  units  for  this  request.  On  the  other  hand,  in 
Strategies  B  and  C,  only  one  of  the  back-ends  will  spend  3x  time  units  for 
this  request.  The  other  two  back-ends  will  spend  only  2x  time  units  for  this 
request.  This  is  the  reason  for  the  improved  performance  of  Strategies  B  and  C. 

The  reason  why  Strategy  B  performs  worse  than  Strategy  C  may  be  explained 
as  follows.  In  Strategy  B,  after  dividing  the  tracks  of  a  response  set  equally 
among  the  back-ends,  the  extra,  say,  i  tracks  are  assigned  to  the  first  i  back¬ 
ends.  As  a  result,  back-end  if 1  will  always  take  the  longest  time  to  answer  a 

retrieve  (delete  or  update)  request.  The  response  time  for  this  request  is 
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proportional  to  the  time  taken  by  back-end  #1  to  answer  the  request,  since  it  always 
takes  the  longest  time.  In  general,  back-end  #1  will  do  more  work  and  take 
more  time  than  any  of  the  other  back-ends,  back-end  if2  will  do  more  work  and 
take  more  time  than  back-ends  #3,  if 4,  if 5,  and  so  on.  In  Strategy  C,  however, 
the  workload  distribution  is  more  even  owing  to  the  fact  that  after  initial 
distribution  the  excess  tracks  are  assigned  randomly  to  the  back-ends.  Hence, 
no  one  back-end  does  more  work  and  take  more  time  than  any  other. 

In  order  to  emphasize  the  advantages  of  a  good  placement  strategy,  i.e.. 
Strategy  C,  over  one  where  no  strategy  is  used,  in  Table  II,  we  tabulate  the 
response  time  ratios  in  these  two  cases  (the  ratios  are  calculated  from  data  in 
Table  I) .  It  is  seen  that  the  response  time  can  be  improved  by  a  factor  of  as 
much  as  6.67  with  good  data  placement.  Furthermore,  it  is  seen  that  for  larger 
number  of  back-ends,  the  ratio  is  larger.  This  is  an  interesting  result,  since 
it  tells  us  that  the  effect  of  our  proposed  placement  strategy  will  become  more 
evident  at  larger  of  number  of  back-ends.  As  we  are  trying  to  develop  an  ex¬ 
tensible  system,  such  a  result  is  encouraging.  It  implies  two  prospects: 

(1)  The  response  time  of  MDBS  will  be  better  than  the  response  time  of  a  sys¬ 
tem  that  does  not  use  a  'good*  data  placement  strategy.  (2)  The  response  time 
of  MDBS  will  improve  as  the  number  of  back-ends  of  MDBS  is  increased.  The 
greater  the  number  of  back-ends,  the  better  the  response  time. 

2.3.3  An  Evaluation  of  the  Data  Placement  Strategies  Using  More  Refined  Assumptions 

In  the  previous  simulation  study,  we  assumed  that  the  data  constituting 
the  response  set  were  evenly  distributed  among  all  the  back-ends.  We  also  as¬ 
sumed  that  the  data  of  the  response  set  at  any  one  back-end  were  randomly  placed 
and  not  necessarily  next  to  each  other  on  the  disk.  We  now  focus  on  more  re¬ 
fined  simulation  by  making  the  assumptions  more  realistic.  Since  a  large  amount 
of  data  being  read  and  manipulated  by  a  back-end  is  likely  stored  on  the  disk 
sequentially,  we  will  not  assume  that  they  are  placed  randomly  on  the  disk. 

Instead,  we  assume  now  that  they  are  likely  placed  sequentially,  i.e.,  one 
track  followed  by  another  track.  We  also  use  more  refined  blocking  factors. 

Instead  of  dividing  data  into  tracks,  we  now  assume  that  the  data  will  be  divi¬ 
ded  into  smaller  units,  known  as  pages.  Let  there  be  n  back-ends  in  MDBS. 

Also,  let  a  request  require,  on  the  average,  the  retrieval  of  x  records.  Finally, 
let  these  x  records  be  stored  as  s  groups  of  x/s  records  each.  The  value  of  s 
determines  the  amount  of  sequentiality  that  is  present  in  the  data  being  re- 
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trieved.  For  example.  In  the  extreme  case,  s  is  of  value  one.  This 
implies  that  all  the  data  being  retrieved  is  stored  sequentially,  since  they 
are  in  one  group.  The  larger  the  value  of  s,  the  greater  the  randomness  with 
which  the  records  being  retrieved  are  scattered  over  the  disk  tracks  at  a 
back-end . 

For  data  placement,  the  following  three  new  strategies  are  considered. 

(1)  Place  —  records  from  each  group  at  every  back-end.  That  is,  g( — ) 

sn  ST1 

records  of  a  group  are  placed  in  some  of  the  back-ends  and  h(— )  re¬ 
cords  are  placed  in  the  remaining  back-ends,  where  g(y)  stands  for  the 
nearest  integer  equal  to  or  greater  than  y,  and  h(y)  stands  for  the 
nearest  integer  equal  to  or  less  than  y. 

(2)  Let  a  page  accommodate  p  records.  Then,  the  —  records  of  a  group 


are  stored  in  g( — )  pages  and  each  back-end  receives 

/fe>\ 


That  is,  certain  back-end  receives  g 


back-end  receives 


(~)\ 
sp  j 

n  / 


m 


pages. 


pages  and  certain  other 


pages . 


(3)  Let  a  track  store  t  records.  Then,  the  —  records  of  a  group  are 

s 

X  8(it) 

stored  in  g(— -)  tracks.  Each  back-end  receives  -  tracks  of 

°  st  n 

/  ,  x  v\ 


data.  That  is,  certain  back-end  receives  g 


and  certain  other  receives 


m 


tracks  of  data 


tracks  of  data. 


After  initial  placement  of  data,  the  remaining  data  will  be  placed  in  the 


/g(fr>  \ 

following  way.  For  Strategy  2,  some  i  of  n  back-ends  receive  g  f — — J  pages 
and  the  remaining  (n-i)  back-ends  receive  one  page  less.  The  i  back-ends  which 
receive  one  page  more  may  be  either  the  first  i  back-ends  or  any  i  consecutive 
back-ends  starting  from  a  randomly  chosen  back-end  which  gives  rise  to  two  varia¬ 
tions  of  Strategy  2.  We  do.'not  consider  other  possible  variations  of  Strategy  2 
and  leave  them  for  future  research.  Similarly,  two  corresponding  variations  of 
Strategy  3  are  also  considered. 

Strategy  1  is  the  same  as  Strategy  A  discussed  in  Section  2.3.2.  Also,  the 
two  variations  of  Strategy  3  are  similar  to  Strategies  B  and  C  of  Section  2.3.2. 
The  two  variations  of  Strategy  2  are  the  counterparts  of  Strategies  B  and  C  of 
Section  2.3.2  where  the  division  is  done  at  page  boundaries  rather  than  at  track 


pages 


boundaries. 

Five  separate  simulation  models  of  MDBS,  one  for  each  of  the  afore¬ 
mentioned  data  placement  strategies,  are  designed  and  evaluated  using  SIMULA 
on  a  DEC  System  20.  The  assumptions  made  in  these  simulations  are  similar  to 
the  ones  made  in  the  simulation  of  Section  2.3.2  and  will  not  be  repeated 
here. 


2. 3. 3.1  The  Choice  of  a  Superior  Strategy  for  Data  Placement  on  the  Basis 
of  Better  Response  Time 

In  discussing  the  new  simulation  results,  we  shall  refer  to  Strategy  1 
as  the  exact  division  strategy;  Strategy  2  as  the  page  splitting  strategy. 

The  two  variations  of  Strategy  2  are  called  page  splitting  with  placement  from 
the  first  back-end  and  page  splitting  with  random  placement,  respectively. 
Similarly,  the  two  variations  of  Strategy  3  are  referred  to  as  track  splitting 
with  placement  from  the  first  back-end  and  track  splitting  with  random  placement. 
The  number  of  records  needed  to  be  retrieved  for  a  request  depends  upon  whether 
the  size  of  the  request  is  small,  or  large.  A  small-sized  request  is  one  which 
requires  the  retrieval  of  one  to  320  records.  A  medium-sized  request  is  one 
which  requires  the  retrieval  of  between  320  and  1,280  records.  Large-sized 
requests  require  the  retrieval  of  between  1,280  and  6,400  records.  We  note 
that  64  records  can  fit  in  a  track.  Thus,  the  retrieval  of  between  320  and 
1,280  records  is  equivalent  to  the  retrieval  of  between  5  and  20  tracks  of  re¬ 
cords. 

The  number  of  groups  into  which  the  retrieved  records  fall  is  chosen  from 
the  set  {1,5,20}.  Choosing  one  as  the  number  of  groups  implies  that  all  the 
data  are  sequentially  related.  Thus,  they  form  a  single  group.  Choosing  20 
as  the  number  of  groups  implies  on  the  other  hand  that  there  are  20  random  data 
aggregates,  although  data  in  the  individual  aggregates  may  be  sequentially  re¬ 
lated.  Another  parameter  which  is  varied  is  the  number  of  back-ends  involved. 
This  is  chosen  from  the  set  {5,10,15}. 

Tables  Ilia,  Illb  and  IIIc,  show  the  simulation  results  for  small-sized, 
medium-sized  and  large-sized  requests,  respectively. 

The  results  Indicate  that  the  track-splitting-with-random-placement  stra¬ 
tegy  is  the  best  one  over  all  possible  numbers  of  back-ends,  of  groups  and  of 
of  the  request  size.  Only  when  the  number  of  groups  is  one  does  the  superior¬ 
ity  of  this  strategy  become  unclear.  Setting  the  number  of  groups  to  one  implies 
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Number  of  records  retrieved  is  between  1  and  320 
Inter-arrival  time  ■  200  msecs 
Small-Sized  requests 
Response  time  results  in  msecs 


NUMBER  OF  GROUPS  -  1 


Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robb in 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robbin 

Track 

Splitting 

Random 

5 

60.4 

40.1 

40.9 

39.4 

40.3 

10 

36.6 

36.3 

36.9 

38.9 

38.7 

15 

35.3 

35.1 

36.1 

38.8 

37.8 

Inter-arrival  time  =  200  msecs 
NUMBER  OF  GROUPS  -  5 


Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robbin 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robbin 

Track 

Splitting 

Random 

5 

358 

343 

343 

501 

89.8 

10 

338 

317 

298 

485 

66.8 

15 

332 

306 

252 

485 

57.6 

Inter 

-arrival  time 

*  1000  msecs 

NUMBER  OF  GROUPS  =  20 

Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robbin 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robbin 

Track 

Splitting 

Random 

5 

977 

929 

726 

1250 

266 

10 

969 

886 

421 

1230 

159 

11 

961 

_ g?L 

305 

1230 

130 

Table  Ilia.  Response  Time  Results  for  the  Various 
Strategies  for  Small-Sized  requests 
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Number  of  records  retrieved  is  between  320  and  1280 
Inter-arrival  time  *  200  msecs 
Medium-Sized  requests 
Response  time  results  in  msecs 


NUMBER  OF  GROUPS  ■*  1 


Number  of 
Back-ends 

t 

Exact 

Division 

Page 

Splitting 

Round 

Robb in 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robb in 

Track 

Splitting 

Random 

i - 

!  5 

75.9 

75.6 

77 

74.8 

75 

'  10 

1 

52.8 

52.6 

53.4 

50.6 

51.2 

15 

45.8 

45.5 

47.4 

44.6 

46.9 

Inter-arrival  time  ■  200  msecs 


NUMBER  OF  GROUPS  =  5 


Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robb in 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robb in 

Track 

Splitting 

Random 

5 

617 

597 

680 

535 

244 

10 

432 

421 

420 

504 

143 

15 

389 

379 

379 

494 

. 

109 

Inter-arrival  time  «*  1000  msecs 


NUMBER  OF  GROUPS  -  20 


Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robb in 

Page 

Splitting 

Random 

Track 
Splitting 
Round 
Robb in 

Track 

Splitting 

Random 

5 

1060 

1050 

1000 

1250 

246 

10 

1010 

996 

987 

1230 

159 

15 

996 

969 

868 

1230 

131 

- — _ 

Table  Illb.  Response  Time  Results  for  the  Various 
Strategies  for  Medium-Sized  Requests 


Number  of  records  retrieved  is  between  1280  and  6400 
Inter-arrival  time  =  500  msecs 
Large-Sized  requests 
Response  time  results  in  msecs 


NUMBER  OF  GROUPS  =  1 


Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robb in 

Page 

Splitting 

Random 

Track 
Splitting 
Round 
Robb in 

Track 

Splitting 

Random 

5 

238 

237 

245 

235 

242 

10 

120 

119 

121 

119 

121.0 

15 

87.1 

86.9 

91.1 

86.2 

92.2 

Inter-arrival  time  =  500  msecs 


NUMBER  OF  GROUPS  -  5 


Number  of 
Back-ends 

‘  1  "H 

Exact 

Division 

Page 

Splitting 

Round 

Robbin 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robbin 

Track 

Splitting 

Random 

5 

517 

513 

571 

491 

465 

10 

308 

305 

312 

290 

265 

15 

258 

256 

266 

_ 

247 

218 

Inter-arrival  time  =  1000  msecs 

NUMBER  OF  GROUPS 

-  20 

Number  of 
Back-ends 

Exact 

Division 

Page 

Splitting 

Round 

Robin 

Page 

Splitting 

Random 

Track 

Splitting 

Round 

Robin 

Track 

Splitting 

Random 

5 

1580 

1560 

1540 

1360 

866 

10 

1220 

1210 

1240 

1270 

435 

.  15  _J 

1130 

1110 

1090 

1250 

307 

Table  IIIc.  Response  Time  Results  for  the  Various 
Strategies  for  Large-Sized  Requests 
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that  all  the  data  in  the  database  must  be  stored  sequentially.  This  is  a  rare 
occurrence  in  database  systems.  Should  this  happen,  all  the  strategies  are  ap¬ 
proximately  equal  in  terms  of  the  resulting  response  times.  However,  the  track- 
splitting-with-random-placeraent  strategy  leads  to  very  dramatic  improvements  in 
response  time  over  the  next  best  strategy  for  the  larger  number  of  groups.  The 
superiority  of  this  strategy  over  all  other  strategies  is  most  evident  for  small 
to  medium-sized  requests  and  when  the  data  tends  to  be  more  randomly  distributed. 
Situations  where  the  average  response  time  of  MDBS  using  the  next  best  strategy 
is  five  times  slower  than  the  response  time  of  MDBS  using  the  track-splitting- 
with-random-placement  strategy  are  noticed.  For  instance,  for  small-sized  re¬ 
quests,  when  the  number  of  groups  is  five  and  the  number  of  back-ends  is  fifteen, 
the  track-splitting-with-random-placement  strategy  leads  to  an  average  response 
time  of  57.6  msecs.  The  nearest  rival,  which  is  the  page-splitting-with-random- 
placement  strategy,  leads  to  a  response  time  of  252  msecs  which  is  almost  five 
times  slower.  The  track-splitting  strategy  performs  better  than  the  page-splitting 
strategy  for  exactly  the  same  reason  as  that  Strategy  C  of  Section  2.3.2  performs 
better  than  Strategy  A  of  the  same  section. 

Thus,  from  a  performance  point  of  view,  the  best  data  placement  strategy  is 
the  one  where  the  response  set  is  divided  up  into  tracks  of  data  and  stored  as 
multiples  of  data  tracks.  Should  there  be  extra  data  tracks,  after  even  distri¬ 
bution  among  the  back-ends,  they  are  assigned  to  the  disks  of  consecutive  back¬ 
ends  starting  from  a  randomly  selected  back-end. 

In  Chapter  IV,  we  will  relate  the  notion  of  clusters  with  the  notion  of 
groups.  Clusters  are  formed  on  the  basis  of  their  attributes  and  their  potential 
utilization.  However,  the  placement  of  the  clusters  is  related  to  the  placement 
strategies  of  the  groups  which  relies  on  the  present  simulation  studies. 

2.3. 3.2  The  Choice  of  a  Superior  Data  Placement  Strategy  on  the  Basis 
of  Better  Storage  Utilization 

Let  us  now  consider  the  various  strategies  from  the  point  of  view  of  storage 
utilization.  In  this  case,  the  comparison  is  between  a  strategy  which  stores 
groups  of  records  in  multiples  of  tracks  (as  in  Strategy  3)  and  a  strategy  which 
stores  groups  of  records  in  multiples  of  pages  (as  in  either  Strategies  1  or  2) . 

Let  us  call  the  former  the  track-splitting  strategies  and  the  latter  the  page¬ 
splitting  strategies. 

There  are  two  factors  affecting  storage  utilization:  First,  there  is  the 


un-utilized  space  owing  to  the  fact  that  records  do  not  exactly  fit  in  a  page 
(or  a  track).  Thus,  if  a  page  can  hold  only  512  bytes  and  each  record  is  of 
size  200  bytes,  112  bytes  on  each  page  is  wasted.  The  second  factor  that  leads 
to  un-utilized  space  is  the  fact  that  each  group  of  records  ends  on  page  bound¬ 
aries  (or  track  boundaries)  and  records  from  two  different  groups  are  never 
placed  on  the  same  page  (or  track).  The  first  factor  is  favorable  to  track¬ 
splitting  strategies  and  the  second  to  page-splitting  strategies. 

Consider  again  that  there  are  x  records  to  be  retrieved  and  that  they  are 
to  be  retrieved  as  s  groups  of  —  records  each.  In  the  page-splitting  stragety 

XX  X 

the  —  records  are  stored  in  g( — )  pages.  The  wasted  space  in  g( — )  -  1  of 
s  sp  sp 

these  pages  is  the  difference  between  the  page  size  and  the  p  record  sizes.  The 
wasted  space  on  the  last  page  is 

X  X 

page  size  -  (—  “(gC^)  -  l)p)  x  record  size 

Let  page  size  *  p',  record  size  =  r,  and  track  size  =  t'.  Then,  percentage  of  wasted 
space  in  the  page  splitting  strategy  is 

(g(^)-l)(p’-pr)  +  p*  -  (f-  (8(JL)-l)p)r 

Similarly,  the  percentage  of  wasted  space  in  the  track-splitting  strategy  is 
(g(~)-l)(t’-tr)  +  t'  -  (f-  (g(^)-l)t)r 

The  percentage  of  wasted  space  will  depend  on  the  size  of  a  record  and  the 

number  of  records  of  a  group.  Some  experimental  results  are  shown  in  Table  IV. 

The  record  size  is  chosen  from  the  {200,  300,  400}  bytes,  and  the  value  of  — 

s 

is  chosen  from  the  set  {20,  50,  100}.  The  results  are  as  follows.  In  either 
strategy,  the  wasted  space  decreases  when  the  record  size  is  increased.  The 
page  splitting-strategy  has  less  wasted  space  when  the  amount  of  sequentiality 
in  the  data  being  retrieved  is  low.  However,  if  a  good  clustering  policy  is 
employed  so  that  more  of  the  records  that  are  likely  to  be  retrieved  together 
are  stored  together,  then  the  track-splitting  strategy  will  waste  less  space. 

In  Chapter  IV,  we  will  present  such  a  policy.  In  conclusion,  we  shall  use  the 
track-splitting  strategy  in  MDBS  both  for  the  good  response  time  and  for  the 
low  storage  wasted. 
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PAGE-SPLITTING  POLICY 
Number  of  Records  Per  Group 


Record 

Size 

(bytes) 

30 

60 

90 

200 

.219 

.219 

.219 

300 

.414 

.414 

.414 

400 

.219 

.219 

.219 

500 

.023 

.023 

.023 

TRACK-SPLITTING  POLICY 
Number  of  Records  Per  Group 


Record 

Size 

(bytes) 

30 

60 

90 

200 

.625 

.023 

.023 

300 

.438 

.023 

.023 

400 

.250 

.023 

.023 

500 

.063 

.023 

.023 

Table  IV.  Comparison  of  Storage  Wastage  Between 

TVo  Different  Placement  Policies  are  Used. 
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2.3.4  Next  Step  in  the  Design  Process 

We  have  now  gone  one  step  further  in  our  design  of  MDBS.  We  have  decided 
that  it  is  going  to  be  a  dedicated  system  in  which  each  back-end  has  attached 
to  it  a  number  of  disk  drives  which  are  not  accessible  from  any  other  back-end. 
As  a  result,  the  controller  and  device  limitation  problems  are  eliminated  in 
MDBS.  Furthermore,  we  have  decided  to  use  a  data  placement  strategy  in  order 
to  store  the  data  in  the  disk  drives  in  an  evenly  distributed  manner  for  the 
improvement  of  the  response  time  and  storage  utilization.  Such  a  strategy  will 
be  proposed  in  details  in  Chapter  IV.  We  also  argue  that  such  hardware  confi¬ 
guration  and  data  placement  will  ensure  that  all  back-ends  will  participate 
in  the  execution  of  a  request  and  eliminate  the  back-end  limitation  problem. 
Unlike  DIRECT,  we  do  not  need  a  costly  interconnection  matrix  in  order  to 
achieve  this.  Finally,  we  will  show  in  the  next  section  that  as  a  result  of 
our  data  placement  strategy  the  amount  of  control  message  traffic  needed  in 
MDBS  is  much  less  than  in  DIRECT.  We  will  also  propose,  in  the  next  section, 
the  incorporation  of  a  broadcast  capability  in  MDBS  for  further  minimizing 
control  message  traffic. 

2.4  Second  Design  Decision  -  Minimizing  the  Problem  of  Control  Message  Traffic 

Consider  a  DIRECT  system  with  three  query  processors  executing  a  request 

that  requires  access  to  nine  pages  of  memory.  The  execution  of  such  a  request 
requires  three  messages  to  be  sent  from  and  to  be  received  by  each  of  the  three 
query  processors.  It  also  requires  the  controller  to  send  and  receive  nine 
messages.  The  figures  are  only  for  messages  that  are  sent  and  received  by  the 
processors  asking  for  page  frames  and  do  not  include  the  three  messages  that 
must  be  sent  by  the  controller  in  order  to  initiate  the  query  processors  nor 
does  it  include  one  message  from  each  of  the  query  processors  when  they  out¬ 
put  the  results.  In  all,  DIRECT  exchanges  24  messages  while  executing  this 
request  for  nine  pages. 

MDBS,  on  the  other  hand,  will  require  only  six  messages.  Three  messages 
are  needed  to  send  the  request  to  the  three  back-ends  and  one  message  will  be 
received  from  each  of  the  back-ends  when  they  output  the  results. 

The  number  of  messages  needed  in  MDBS  may  be  further  reduced  to  four  if  we 
have  a  broadcast  capability.  Our  feeling  that  a  broadcast  capability  is  impor¬ 
tant  for  MDBS  is  prompted  by  the  following.  Since  the  data  placement  strategy 
ensures  that  all  back-ends  will  be  participating  in  answering  a  request,  the 
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request  must  now  be  sent  to  every  back-end.  Thus,  with  a  broadcast  capability 
of  a  system  of  n  back-ends,  we  need  only  a  single  message  rather  than  n  mes¬ 
sages  to  broadcast  a  request.  The  simulation  experiment  in  the  next  section 
illustrates,  graphically,  the  advantages  of  a  broadcast  capability. 

2.4.1  The  Need  for  a  Broadcast  Capability 

Two  sets  of  simulations  are  run  -  one  for  MDBS  without  the  broadcast 
capability  and  one  for  MDBS  with  such  a  capability.  Once  again,  the  simula¬ 
tion  programs  are  written  in  SIMULA  on  a  DEC  System  20.  It  is  assumed,  as  in 
the  earlier  simulation  experiments,  that  four  types  of  requests  (i.e.,  retrieve, 
insert,  delete  and  uodate)  are  issued  and  that  25%  of  all  requests  are  of  the 
insert  type.  Also,  as  before,  it  is  assumed  that  anywhere  between  five  and 
twenty  tracks  will  have  to  be  retrieved  and  searched  in  order  to  answer  a  re¬ 
trieve,  delete  or  update  request.  The  requests  are  assumed  to  arrive  in  a 
Poisson  stream.  Finally,  it  is  assumed  that  the  time  taken  to  retrieve  records 
from  the  secondary  memory  dominates  the  CPU  processing  time  so  that  the  latter 
time  may  be  ignored.  Simulations  are  run  for  various  number  of  back-ends  and 
various  request  inter-arrival  times.  The  response  time  results  are  tabulated 
in  Table  Va. 

Table  Va  indicates  that  the  inclusion  of  a  broadcast  capability  can  make 
the  MDBS  system  up  to  2.5  times  as  fast  as  it  would  be  without  such  a  capability. 
Also,  the  utility  of  such  a  capability  becomes  more  evident  for  larger  number 
of  back-ends.  This  is,  of  course,  to  be  expected.  It  should  be  pointed  out 
that  the  improved  response  time  in  the  case  of  MDBS  with  the  broadcast  capability 
is  a  direct  result  of  this  capability,  since  the  control  message  traffic  is  reduced. 

2.4.2  An  Evaluation  of  the  Broadcast  Capability  with  More  Refined  Assumptions 

The  results  of  Table  Va  assumed  that  the  tracks  to  be  retrieved  from  the 

secondary  memory  of  a  back-end  were  distributed  randomly  across  the  disk  drives 
of  that  back-end.  Since  each  track  requires  the  time  for  seek  and  rotation  we 
now  consider  that  the  tracks  to  be  retrieved  at  a  back-end  are  clustered 
in  such  a  way  that  the  seek  time  may  be  unnec ~sary  for  all  except  the  first 
track  retrieved.  The  number  of  back-ends  is  chosen  from  the  set  {5,  10,  15}. 

Another  parameter  that  is  varied  Is  the  number  of  tracks  that  must  be  retrieved 
for  a  typical  request.  A  small-sized  request  requires  the  retrieval  of  between 
one  and  five  tracks.  A  medium-sized  request  requires  the  retrieval  of  between 
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Table  Va.  Response  Time  for  MDBS  with  and 
without  Broadcast  Facility 


five  and  twenty  tracks.  Finally,  for  large-sized  requests,  the  number  of  tracks 
to  be  retrieved  varies  from  twenty  to  one  hundred.  The  results  for  small-sized, 
medium-sized,  and  large-sized  requests  are  shown  in  Table  Vb,  Vc  and  Vd,  re¬ 
spectively. 

Consider  the  results  for  the  case  where  the  number  of  tracks  to  be  retrieved 
varies  from  one  to  five.  With  an  interarrival  time  of  one  request  every  200 
msecs,  the  average  response  times  for  5,  10  and  15  back-ends  using  broadcast  is 
31.7,  20.6  and  17.2  msecs,  respectively.  Thus,  the  response  time  decreases 
with  increasing  number  of  back-ends  as  expected.  On  the  other  hand,  the  corres¬ 
ponding  response  times  for  the  case  where  no  broadcast  capability  is  assumed  are 
46.2,  59.3  and  87.0  msecs,  respectively.  That  is,  the  response  time  actually 
increases  with  increasing  number  of  back-ends!  This  is  because  the  message 
sending  time  is  dominating  the  secondary  memory  access  time  (since  the  number 
of  tracks  to  be  retrieved  is  so  few).  By  comparing  the  set  of  response  times 
with  the  broadcast  capability  with  the  corresponding  set  of  response  times 
without  such  a  capability,  we  see  that  the  use  of  broadcast  may  result  in  50%, 
130%  and  400%  of  improvements  for  the  respective  configurations  of  5,  10  and 
15  back-ends.  The  results  remain  to  be  the  same,  even  when  the  interarrival 
rate  is  increased  to  one  request  every  100  msecs. 

Next,  let  us  consider  the  results  for  the  case  where  the  requests  are 
medium-sized  and  where  the  requests  require  the  retrieval  of  between  five  and 
twenty  tracks  of  data.  Once  again,  the  response  time  of  MDBS  without  the  broad¬ 
cast  capability  increases  with  increasing  number  of  back-ends.  Thus,  with  an 
interarrival  time  of  one  request  every  100  msecs  and  five  back-ends,  the  response 
time  is  85.1  msecs.  However,  the  response  time  with  ten  back-ends  is  87.4  msecs 
and  with  15  back-ends  it  is  104  msecs.  On  the  other  hand,  the  system  with  the 
broadcast  capability  behaves  in  an  increasingly  better  manner.  The  correspond¬ 
ing  figures  in  this  case  are  96.4,  56.7  and  43.2.  The  percentage  of  improvement 
gets  to  be  as  high  as  250%  or  2.5  times  better. 

Finally,  consider  the  results  for  the  case  where  the  number  of  tracks  re¬ 
trieved  is  between  twenty  and  one  hundred.  The  effect  of  not  having  a  broadcast 
capability  is  expected  to  be  felt  the  least  under  such  circumstances.  The  simu¬ 
lation  results  certainly  bear  out  this  intuitively  expected  result.  For  the 
first  time,  the  system  without  the  broadcast  capability  shows  an  improvement  in 
response  time  when  the  number  of  back-ends  is  increased  from  five  to  ten.  How¬ 
ever,  the  performance  of  the  system  does  not  improve  when  the  number  of  back-ends 
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Response  Time  Tables  (in  msecs) 
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is  increased  from  ten  to  fifteen.  In  contrast,  the  performance  of  the  system 
with  the  broadcast  capability  improves  constantly  with  ever  increasing  number 
of  back-ends.  When  the  number  of  back-ends  is  15,  we  note  that  the  system 
with  the  broadcast  capability  still  outperforms  the  system  without  the  broad¬ 
cast  capability  by  90%. 

In  conclusion,  the  use  of  a  broadcast  capability  can  be  very  important. 

The  need  becomes  more  acute  when  the  number  of  back-ends  is  large  and  when 
typical  requests  do  not  require  the  retrieval  of  large  amounts  of  data.  A  sys¬ 
tem  without  broadcast  capability  can  behave  anomalously  in  that  the  response 
time  actually  increases  with  increasing  number  of  back-ends.  Such  degradation 
in  response  time  is  primarily  due  to  the  increase  in  control  message  traffic. 

We  wish  to  emphasize  that  the  ability  to  broadcast  is  essentially  attained 
by  software  means.  That  is,  no  special-purpose  hardware  is  needed  in  order  to 
achieve  a  broadcast  capability.  There  are  many  examples  of  systems  that  pro¬ 
vide  a  broadcast  capability.  For  instance,  Ethernet  [Metc76],  which  is  a 
coaxial  cable  network,  provides  such  a  capability.  Another  scheme  which  pro¬ 
vides  a  broadcast  capability  is  the  time-shared  bus.  Three  different  commonly 
used  techniques  for  achieving  a  broadcast  capability  using  a  time-shared  bus 
are  described  in  [Tane81]  and  will  not  be  repeated  here.  In  conclusion, 
broadcast  capability  is  achieved  by  using  appropriate  software  and  requires 
no  special-purpose  hardware.  Furthermore,  such  software  is  available  from 
many  vendors  and  is  being  commonly  used.  Hence,  our  proposal  for  using  a 
broacast  capability  in  MDBS  does  not  lead  to  the  problem  of  hardware  speciali¬ 
zation. 

2.5  An  Overview  of  the  MDBS  Architecture  and  Design 

At  this  point,  our  proposed  architecture  looks  as  shown  earlier  in  Figure  7 
It  consists  of  a  single  controller  attached  to  a  number  of  back-ends  by  way  of 
a  bus  or  a  local  network  (such  as  an  Ethernet)  which  provides  a  broadcast  capa¬ 
bility.  A  back-end  is  attached  with  a  number  of  disk  drives  which  may  be 
accessed  only  by  that  back-end.  Since  we  do  not  use  special-purpose  hardware, 
we  have  therefore  solved  the  problem  of  hardware  specialization  in  MDBS.  In 
Chapter  1,  we  had  listed  eleven  different  issues  which  had  to  be  studied  in  the 
context  of  a  multiple  back-end  system.  Let  us  briefly  consider  those  issues 
that  have  been  resolved  for  MDBS  at  this  point  of  the  design  study. 

The  first  issue  was  regarding  the  optimal  way  of  interconnecting  a  large 


number  of  back-ends  and  the  optimal  way  of  connecting  the  controller  to  the 
back-ends.  We  have  decided  on  a  scheme  whereby  all  the  back-ends  and  the 
controller  are  attached  to  a  bus.  This  provides  the  controller  with  the  abili¬ 
ty  to  broadcast  requests  to  all  the  back-ends  and  leads  to  a  minimization  of 
the  control  message  traffic  problem. 

Another  issue  concerns  the  placement  of  data  aggregates  of 
the  database  across  the  back-ends.  This  issue  has  also  been  resolved  to  our 
satisfaction.  We  have  shown,  by  simulation,  a  strategy  with  jhich  the  response 
sets  to  all  requests  may  be  evenly  distributed  among  the  back-ends  and  the  extra 
tracks  of  data  are  assigned  to  consecutive  back-ends  starting  from  a  randomly 
selected  back-end.  The  strategy  is  best  in  terms  of  minimum  response  time  and 
minimum  storage  wastage.  We  have  termed  it  the  track-splitting-with-random- 
placement  strategy.  This  strategy  thus  eliminates  the  back-end  limitation  pro¬ 
blem. 

Another  issue  we  had  mentioned  in  Chapter  1  was  the  database  store  inter¬ 
connection  problem  -  that  is,  the  attachment  of  disk  drives  to  back-ends.  We 
have  chosen  to  adopt  the  dedicated  approach  because  it  eliminates  the  channel 
limitation  problem.  Furthermore,  by  attaching  more  than  one  disk  drive  to  each 
back-end,  we  have  eliminated  the  device  limitation  problem. 

The  next  issue  concerns  the  execution  strategy.  That  is,  should  a  SIMD 
or  a  MIMD  approach  be  used.  Our  proposed  design  operates  in  a  MIMD  fashion. 

As  much  as  possible,  the  different  back-ends  are  made  to  work  on  the  same  request. 
However,  when  a  back-end  finishes  the  execution  of  a  request,  the  back-end  is  to 
start  the  execution  of  the  next  request.  In  other  words,  the  different  back-ends 
are  executing  requests  in  an  asynchronous  fashion.  Asynchronous  execution  is 
achieved  in  MDBS  by  having  a  queue  of  requests  at  each  back-end.  Whenever  a 
back-end  finishes  a  request,  it  picks  up  the  next  request  from  its  queue  and 
begins  its  execution.  As  we  had  indicated  in  our  survey  of  DIRECT,  this  saves 
the  need  for  a  back-end  to  send  a  message  to  the  controller  after  execution  of 
each  request  and  the  need  for  the  controller  to  respond  with  the  next  request 
in  the  controller  queue.  Thus,  assuming  a  message  time  of  eight  milliseconds, 
this  proposal  will  save  sixteen  milliseconds  per  request  over  a  proposal  where 
no  queue  of  requests  is  maintained  at  each  back-end.  Additionally,  it  allevi¬ 
ates  the  controller  limitation  problem  to  some  extent  by  removing  the  queue 
handling  software  from  the  controller  and  placing  it  at  the  back-ends.  In  fact, 
the  amount  of  overhead  associated  with  the  controller  is  exactly  the  same  as 
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would  have  been  associated  with  It  If  the  system  had  been  executing  In  an  SIMD 
mode.  In  other  words,  our  proposal  for  a  queue  at  each  back-end  allows  us  to 
reap  the  benefits  of  the  MIMD  mode  of  execution  at  the  price  of  the  SIMD  mode 
of  execution. 

Thus,  five  of  the  nine  design  goals  we  set  for  ourselves  in  Section  2.2.1 
have  been  achieved  at  this  point.  We  will  achieve  the  goal  of  finding  a  ca¬ 
nonical  data  model  in  Chapter  III.  We  will  achieve  the  goal  of  using  identical 
software  in  Chapter  IV.  The  means  for  achieving  the  goal  of  eliminating  the 
multiple  request  execution  problem  is  described  in  Chapter  VI.  Finally,  in 
order  to  achieve  our  goal  of  alleviating  the  controller  limitation  problem, 
the  directory  management,  security  enforcement  and  concurrency  control  algori¬ 
thms  are  carefully  designed.  These  algorithms  are  described  in  Chapters  IV,  V 
and  VI,  respectively. 


-  62  - 


3.  THE  CHOICE  OF  A  DATA  MODEL  AND  A  DATA  MANIPULATION  LANGUAGE 

In  this  chapter,  we  will  develop  a  canonical  data  model  for  MDBS  imple¬ 
mentation.  A  canonical  model  must  meet  the  following  three  criteria:  the  transla¬ 
tion  criterion,  the  partition  criterion  and  the  language  criterion. 

These  criteria  will  be  elaborated  in  the  following  sections.  An  attribute- 

based  model  is  proposed  herein  which  is  the  best  candidate  to  meet  the  criteria. 

With  the  canonical  data  model  as  a  basis,  we  then  propose  a  data  manipulation 
language  in  which  users  may  issue  requests  to  MDBS.  The  language  also  encom¬ 
passes  the  useful  notion  of  a  transaction.  Finally,  in  this  chapter,  we  contrast 
the  basic  requests  with  the  aggregate  requests.  The  latter  perform  aggregate 
operations  such  as  summation,  average,  maximum  and  minimum,  over  attribute  values. 

3.1  Three  Selection  Criteria 

We  will  discuss  the  criteria  informally  and  briefly  herein.  A  detailed 
discussion  of  the  criteria  will  be  included  in  the  course  of  developing  the 
canonical  model,  known  as  the  attribute-based  model. 

3.1.1  The  Translation  Criterion 

All  the  systems  surveyed  In  Chapter  II  [Auer80,  Dewi78,  Miss80,  Ston78] 
support  the  relational  model  of  data.  On  the  other  hand,  a  large  number  of 
operational  database  systems  [Datayy,  Idmsyy,  Systyy,  Adabyy,  Totayy]  support 
either  the  hierarchical  or  the  network  data  model.  Therefore,  there  is  the 
need  to  translate  the  existing  hierarchical  and  network  databases  into  rela¬ 
tional  databases  before  they  may  be  used  on  relational  systems.  Such  transla¬ 
tion  is  a  part  of  general  studies  of  converting  a  database  from  one  model  to  anoth¬ 
er  and  is  known  as  database  transformation  [Bane80].  Furthermore,  there  is  the 
need  to  translate  the  requests  for  the  hierarchical  or  network  database  into  re¬ 
quests  for  a  relational  database.  Bequest  translation  is  also  a  part  of  gener¬ 
al  studies  of  translating  a  data  language  to  another  and  is  known  as  query  trans¬ 
lation  [Bane80],  To  the  best  of  our  knowledge,  complete  solutions  to  the  problems  of 
database  transformation  and  query  translation  among  the  aforementioned  thre^  models 
are  not  at  hand.  Hence,  more  research  needs  to  be  done  before  hierarchical  and  net¬ 
work  databases  may  be  supported  entirely  on  a  relational  system.  For  this  reason, 
we  do  not  prefer  the  relational  model.  Similar  reasons  may  be  raised  to  reject  the 
network  and  hierarchical  data  models.  Database  transformation  and  query  translation 
constitute  therefore  the  first  criterion,  i.e.,  the  translation  criterion,  for 
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selecting  a  data  model. 

There  are  two  more  criteria  that  a  data  model  must  satisfy  if  we  are  to 
implement  the  data  model  in  MDBS.  These  are  referred  to  as  the  partition 
criterion  and  the  language  criterion.  We  will  illustrate  these  two  criteria 
by  means  of  examples  in  the  sequel.  The  examples  will  also  illustrate  why  we 
choose  to  reject  the  network  and  hierarchical  data  models  for  MDBS  implementation. 

3.1.2  The  Partition  Criterion 

Consider  the  sample  network  data  model  in  Figure  10  where  an  inventory 
control  database  consists  of  four  record  types  (namely,  customer,  order,  item 
and  part)  and  three  set  types  (customer-order,  order-item  and  part-item). 

The  customer -order  set  links  each  customer  to  all  his  orders.  Thus,  Customer 
ABC  is  linked  to  orders  1,  2  and  3.  Corresponding  to  each  order,  there  can 
by  any  number  of  items.  Thus,  for  example,  items  1  and  2  correspond  to  order 
1.  Finally,  a  part  in  the  inventory  consists  of  a  number  of  items.  For  ex¬ 
ample,  part  2  consists  of  items  2,  3,  5,  7  and  8. 

A  typical  request  for  such  a  database  may  be  as  follows: 

'list  all  the  order  numbers  for  Customer  ABC'. 

A  conventional  network  database  system  would  respond  to  such  a  request  by  first 
retrieving  the  record  for  Customer  ABC  (with  the  use  of  a  hashing  function  on 
the  unique  customer  number)  and  then  following  the  chain  of  pointers  to  make 
three  additional  secondary  accesses  to  retrieve  the  records  for  orders  1,  2 
and  3.  Thus,  a  conventional  system  would  need  four  secondary  accesses  to  re¬ 
spond  to  this  request. 

Consider,  now,  how  a  system  like  MDBS  would  answer  this  request.  For  ex¬ 
pository  purposes,  let  us  assume  that  MDBS  consists  of  a  controller  and 
three  back-ends.  Now,  the  response  time  of  MDBS  to  this  particular  request 
would  depend  upon  the  manner  in  which  the  database  of  Figure  10  is  distributed 
across  the  three  back-ends.  If  the  record  corresponding  to  Customer  ABC,  and 
those  corresponding  to  orders  1,  2  and  3  are  all  stored  at  one  of  the  back¬ 
ends,  then  MDBS,  like  the  conventional  system,  would  also  need  four  ac¬ 
cesses  to  the  secondary  memory.  Clearly,  we  can  do  better  than  four. 

One  way  to  minimize  the  number  of  accesses  might  be  to  partition  the 
database  in  such  a  way  that  the  three  records  in  the  customer-order  set  with 
Customer  ABC  as  their  owner  are  placed  in  the  three  different  back-ends.  The 
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new  database  storage  is  now  shown  in  Figure  11.  Essentially,  we  have  parti¬ 
tioned  the  member  records  of  the  set  customer-order  by  the  back-ends.  Thus, 
any  request  which  requires  the  retrieval  of  all  members  in  any  customer-order 
set  may  be  easily  answered.  The  same  request 

'list  all  order  numbers  for  Customer  ABC' 
is  now  answered  in  two  accesses  to  the  secondary  memory,  instead  of  the  four 
needed  in  the  previous  environment. 

We  have,  however,  had  to  incorporate  some  redundancy  in  the  database. 

Thus,  the  Customer  ABC  record  is  now  to  be  stored  in  three  different  disks, 
the  Customer  DEF  record  is  to  be  stored  in  two  disks,  the  Part  1  record  is 
to  be  stored  in  two  disks  and  the  Part  2  record  is  to  be  stored  in  three  disks. 

The  record  storage  increases  by  a  factor  of  (24/18=)  1.5.  Similarly,  the  num¬ 
ber  of  pointers  has  gone  up  from  31  to  37 . 

In  addition  to  the  fact  that  the  total  amount  of  storage  may  go  up  is 
the  new  problem  of  updating  redundant  data.  Thus,  if  the  address  of  Customer 
ABC  is  changed,  it  must  now  be  changed  in  three  different  back-ends.  Hence, 
the  controller  must  be  aware  of  all  the  different  copies  of  a  record  and  must 
ensure  the  mutual  consistency  (Thom?9]  of  all  these  different  copies  during  up¬ 
date.  Algorithms  for  ensuring  the  mutual  consistency  of  these  different  copies 
are  rather  complicated.  Thus,  we  should  avoid  the  problem  if  possible. 

Although  the  above  solution  to  partitioning  the  network  database  has  re¬ 
sulted  in  a  faster  response  to  requests  for  all  members  in  any  customer-order 
set,  the  response  to  requests  for  all  members  in  an  order-item  set  is  as  slow 
as  in  any  conventional  system.  In  order  to  improve  the  response  time  to  such 
requests,  we  will  need  to  partition  the  member  records  of  all  order-item  sets 
by  the  back-ends.  This  causes  a  further  increase  in  the  amount  of  data  re¬ 
dundancy  . 

Finally,  we  see  that  the  insertion  of  a  new  record  into  the  database  will 
require  the  addition  of  other  data.  In  referring  to  Figure  11,  for  example, 
if  a  new  order  is  created  for  Customer  DEF  and  must  be  inserted  into  the  data¬ 
base,  it  should  be  inserted  into  back-end  3  in  order  to  ensure  an  even  dis¬ 
tribution  of  the  member  records  of  Customer  DEF  across  the  back-ends.  However, 
since  back-end  3  does  not  contain  a  record  for  Customer  DEF,  a  copy  of  the  Customer 
DEF  record  must  be  created  in  back-end  3  in  order  to  represent  the  relationship 
between  Customer  DEF  and  the  newly  Inserted  order  record. 

From  the  above  discussion,  we  learn  that  the  partitioning  of  a  network 
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database  for  MDBS  may  introduce  the  problems  of  storage  redundancy  and  mutual 
consistency.  A  data  model  is  easily  partitionable  if  the  partitioned  database 
stored  in  MDBS  for  the  model  leads  to  little  or  no  storage  redundancy. 

Consequently,  there  will  be  little  or  no  concern  for  the  problem  of  mutual  con¬ 
sistency  during  update.  Both  hierarchical  and  network  models  are  not  easily 
partitionable.  On  the  other  hand,  the  relational  model  is  easily  partition- 
able.  This  concludes  our  presentation  of  the  partition  criterion.  It  also 
illustrates  why  we  are  against  implementing  a  network  or  hierarchical  data 
model  in  MDBS. 

3.1.3  The  Language  Criterion 

Another  reason  for  not  implementing  the  hierarchical  or  network  data  model 
in  MDBS  is  related  to  the  data  manipulation  languages  associated  with  these  data 
models,  i.e.,  the  so  called  language  criterion.  It  is  evident  that  MDBS  will 
outperform  conventional  systems  for  requests  that  require  content-addressing  and 
retrieval  of  large  volumes  of  data,  because  the  data  can  be  spread  across  many 
back-ends  and  can  be  fetched  in  parallel.  However,  if  the  user  transaction  de¬ 
mands  records  in  a  sequential,  one-record-at-a-time  manner,  then  MDBS  cannot 
outperform  a  conventional  system  to  any  great  extent.  Unfortunately,  the  data 
manipulation  languages  associated  with  hierarchical  network  databases  tend  to 
manipulate  data  in  a  sequential,  one-record-at-a-time  manner.  Hence,  they  are 
unsuitable  for  MDBS  implementation.  The  ideal  language  for  MDBS  will  be  one 
which  is  highly  concurrent  and  which  requires  the  retrieval  of  large  volumes  of 
data.  We  shall  present  such  a  language  in  a  later  section. 

3.2  The  Attribute-Based  Model 

Having  eliminated  from  consideration  the  relational,  hierarchical  and  net¬ 
work  data  models,  we  shall  now  consider  the  attribute-based  data  model  [Hsia70, 

Roth 80,  Wong71].  In  {Bane77,  Bane78a,  Bane80] ,  several  contributions  are  made  with 
regard  to  the  attrib>’te-based  model.  First,  they  have  shown  that  any  relational, 
network  or  hierarchical  database  may  be  transformed,  in  a  straightforward  way, 
into  an  attribute-based  database.  Thus,  there  is  no  database  transformation 
problem.  They  have  also  demonstrated  that  the  requests  issued  in  the  data 
manipulation  languages  of  these  three  data  models  may  be  easily 
translated  into  the  requests  of  the  attribute-based  data  manipulation  language. 
Thus,  the  attribute-based  data  model  does  not  suffer  from  the  query  translation 
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problem.  Consequently,  unlike  the  relational,  hierarchical  and  network  data 
models,  the  attribute-based  model  meets  the  translation  criterion.  Why  is  it 
that  other  data  models  and  their  manipulation  languages  can  be  so  easily 
translated  into  the  attribute-based  model  and  its  manipulation  language?  The 
reason  has  to  do  with  the  fact  that  the  attribute-based  model  is  a  very  basic 
model  which  embodies  only  a  few  simple  concepts.  When  one  tries  to  transform 
the  database  of  a  complex  model  into  a  database  of  another  complex  model,  one 
must  use  the  complex  concepts  in  the  second  model  to  'emulate'  the  complex  con¬ 
cepts  of  the  first  model.  This  is  difficult  due  mainly  to  the  major  differences 
among  the  concepts  of  these  models.  For  example,  the  concept  of  a  set  of  the 
network  model  is  sufficiently  complex  as  to  make  it  difficult  to  find  its  counter¬ 
part  in  the  relational  and  hierarchical  models.  In  other  words,  it  is  not  easy 
to  find  concepts  in  the  relational  and  hierarchical  models  to  emulate  the 
concept  of  a  set.  On  the  other  hand,  being  basic,  the  concepts  of  the  attri¬ 
bute-based  model  may  be  used  as  building  blocks  for  the  more  complex  concepts 
of  the  aforementioned  three  data  models.  The  process  of  translating  a  complex 
concept  into  one  or  more  elementary  concepts  is  frequently  an  easier  task  than 
the  process  of  translating  a  complex  concept  to  one  or  more  complex  ones. 

Next,  the  attribute-based  data  model  does  not  suffer  from  the  partition 
problem.  We  recall  that  the  network  data  model  suffers  from  the  partition  pro¬ 
blem  because  its  partitioned  database  has  a  large  amount  of  data  redundancy. 

Such  redundancy  was  essentially  caused  by  the  fact  that  two  different  mechanisms 
are  used  to  represent  data  in  a  network  database,  where  entities  are  represented 
by  records  and  relationships  are  represented  by  pointers.  In  an  attribute-based 
model,  all  logical  concepts  (i.e.,  entities  and  relationships)  are  represented  by 
attribute-value  pairs.  Thus,  data  may  be  easily  partitioned  across  the  various 
back-ends  with  no  redundancy.  Finally,  the  data  manipulation  language  of  the 
attribute- based  model  does  not  operate  in  a  sequential,  record-at-a-time  manner. 

By  the  use  of  boolean  expressions  of  predicates  as  queries,  it  operates  in  a 
highly  concurrent  manner  thus  allowing  us  to  utilize  the  capabilities  of  MDBS 
to  the  fullest.  Accordingly,  we  choose  to  implement  the  attribute-based  model, 
since  it  meets  all  three  criteria,  namely,  the  translation,  partition  and 
language  criteria. 
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3.2.1  Concepts  and  Terminology 

The  smallest  unit  of  data  in  MDBS  is  a  keyword  which  is  an  attribute-value 
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pair,  where  the  attribute  may  represent  the  type,  quality,  or  characteristic 
of  the  value.  Information  is  stored  in  and  retrieved  from  MDBS  in  terms  of 
records;  a  record  is  made  up  of  a  collection  of  keywords  and  a  record  body. 

The  record  body  consists  of  a  (possibly  empty)  string  of  characters  which  are 
not  used  for  search  purposes.  For  logical  reasons,  all  the  attributes  in  a 
record  are  required  to  be  distinct.  An  example  of  a  record  is  represented 
below: 

(<File,EMP>,  < Job,MGR>,  <Dept,TOY>,  <Salary ,30000>) . 

The  record  consists  of  four  keywords.  The  value  of  the  attribute  Dept,  for 
instance,  is  TOY.  In  this  dissertation,  we  will  use  "<",  ">"  to  bracket  a 
at tribute- value  pair;  "(",")"  to  parenthisize  a  record;  character  strings 
with  leading  capitals  for  attributes;  numerals  or  all  capitals  for  values; 
and  commas  to  separate  the  attribute-value  pairs  of  a  record. 

MDBS  recognizes  several  kinds  of  keywords:  simple,  security  and  direc¬ 
tory.  Simple  keywords  are  intended  for  search  and  retrieval  purposes.  Secur¬ 
ity  keywords  are  intended  for  access  control  and  will  be  elaborated  fully  in 
Chapter  V.  Directory  keywords  are  used  for  forming  clusters.  A  cluster  of 
records  has  a  high  probability  of  being  retrieved  from  the  back-ends  together. 
Records  of  a  cluster  are  therefore  stored  in  close  proximity.  We  will  dis¬ 
cuss  the  concept  of  a  cluster  and  cluster  algorithms  in  chapter  IV.  In  this 
chapter,  we  present  the  attribute-based  model,  the  concept  of  simple  keyword 
and  its  data  manipulation  language. 

A  keyword  predicate,  or  simply  predicate,  is  of  the  form  (attribute,  re¬ 
lational  operator,  value).  A  relational  operator  can  be  one  of  {=,?t,>,?,< ,5} . 

A  keyword  K  is  said  to  satisfy  a  predicate  T  if  the  attribute  of  K  is  identical 
to  the  attribute  in  T  and  the  relation  specified  by  the  relational  operator  of 
T  holds  between  the  value  of  K  and  the  value  in  T.  For  example,  the  keyword 
<Salary,15000>  satisfies  the  predicate  (Salary>10000) . 

A  descriptor  can  be  one  of  three  types: 

Type-A:  The  descriptor  is  a  conjunction  of  a  less-than-or-equal-to  pre¬ 
dicate  and  a  greater-than-or-equal-to  predicate,  such  that  the 
same  attribute  appears  in  both  predicates.  An  example  of  a  type-A 
descriptor  is  as  follows: 

((Salary>2,000)  A  (SalarySlO, 000) ) . 

More  simply,  this  is  written  as  follows: 

( 2 , OOOSSalarySlO , 000 ) . 
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Thus,  for  creating  a  type-A  descriptor,  the  database  creator 
merely  specifies  an  attribute  (i.e.,  Salary)  and  a  range  of 
values  ($2,000  and  $10,000)  for  that  attribute.  We  term  the 
value  to  the  left  of  the  attribute  the  lower  limit  and  the 
value  to  the  right  of  the  attribute  the  upper  limit. 

Type-B:  The  descriptor  is  an  equality  predicate.  An  example  of  a 
type-B  descriptor  is: 

(Position=PR0FESS0R) . 

Type-C:  The  descriptor  consists  of  only  an  attribute  name,  known  as  the 
type-C  attribute.  Let  us  assume  that  there  are  n  different  key¬ 
words  Kj,  K2,  ...»  and  in  the  records  of  a  database  with  a 
type-C  attribute.  Then,  the  type-C  descriptor  is  really  equi¬ 
valent  to  n  type-B  descriptors  B1 ,  B2,  ...  and  Bjj,  where  B^  is 
the  equality  predicate  satisfied  by  K^.  In  fact,  this  type-C 
descriptor  will  cause  n  different  type-B  descriptors  to  be  formed. 
From  now  on,  we  shall  refer  to  the  type-B  descriptors  formed  from 
a  type-C  descriptor  as  type-C  sub-descriptors. 

For  instance,  consider  that  Dept  is  specified  as  a  type-C  attribute 
for  a  file  of  employee  records.  Furthermore,  let  all  employees  in 
the  file  belong  to  either  the  TOY  department  or  the  SALES  department. 
Then  two  type-B  descriptors  will  be  formed  for  this  file.  They  are 
(Dept=T0Y)  and  (Dept=SALES) . 

In  forming  descriptors,  the  database  creator  must  observe  certain  rules  as 
follows : 

(1)  Ranges  specified  in  type-A  descriptors  for  a  given  attribute  must  be 
mutually  exclusive. 

(2)  For  every  type-B  descriptor  of  the  form  (attribute-1  =  value  1),  no 
type-A  descriptor  can  have  the  same  attribute  (i.e.,  attribute-1)  and 
a  range  that  contains  the  same  value  (i.e.,  value-1). 

(3)  An  attribute  that  appears  in  a  type-C  descriptor  must  not  also  appear 
in  a  type-A  or  a  type-B  descriptor  defined  previously. 

(4)  Type-A  descriptors  are  specified  first;  Type-B  descriptors  next;  Type-C 
descriptors  last. 

A  keyword  is  said  to  be  der ived  or  derivable  from  a  descriptor  if  one  of  the 
following  holds: 

(a)  The  attribute  of  the  keyword  is  specified  in  a  type-A  descriptor  and 
the  value  is  within  the  range  of  the  descriptor. 
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(b)  The  attribute  and  value  of  the  keyword  match  those  specified  in  a 
type-B  descriptor. 

(c)  The  attribute  of  the  keyword  is  specified  in  a  type-C  descriptor. 

The  use  of  these  descriptors  will  be  demonstrated  in  Chapter  4. 

A  query  conjunction,  or  simply  conjunction,  is  a  conjunction  of  predicates. 

An  example  of  a  query  conjunction  is: 

(Salary>25000)  A  (Dept=TOY)  A  (Name=JAI) 

We  say  that  a  record  satisfies  &  query  conjunction  if  the  record  contains  key¬ 
words  that  satisfy  every  predicate  in  the  conjunction. 

A  query  is  a  boolean  expression  of  predicates.  An  example  of  a  query  is: 

( (Dept=TOY)  A  (SalarydOOOO))  v  ( (Dept=BOOK)  A  (Salary>50000)) 

3.2.2  The  Data  Manipulation  Language  (DML) 

The  data  manipulation  language  for  MDBS  is  a  non-procedural  language  which 
supports  four  different  types  of  requests  -  retrieve,  insert,  delete  and  update. 

The  syntax  of  these  various  requests  and  examples  of  them  are  presented  below. 

For  a  formal  specification  of  DML,  the  reader  may  refer  to  the  Appendix  A. 

A.  Retrieve 

The  syntax  of  a  retrieve  request  is: 

RETRIEVE  Query  Target-list  [BY  Attribute] [WITH  Pointer] 

That  is,  it  consists  of  five  parts.  The  first  part  is  the  name  of  the  request. 

The  second  part  is  the  query  (as  defined  in  Section  3.2)  which  characterizes  the 
portion  of  the  database  to  be  retrieved.  The  target-list  is  a  list  of  elements. 
Each  element  is  either  an  attribute,  e.g..  Salary,  or  an  aggregate  operator  to  be 
performed  on  an  attribute,  e.g.,  AVG(Salary).  We  will  support  five  aggregate 
operators  -  AVG,  SUM,  COUNT,  MAX,  MIN  -  in  MDBS.  An  example  of  a  target-list  of 
two  elements  is  (Dept,  AVG(Salary) ) .  The  values  of  an  attribute  in  the  target-list 
are  retrieved  from  all  the  records  identified  by  the  query.  If  no  aggregate  oper¬ 
ator  is  specified  on  the  attribute  in  the  target-list,  its  values  in  all  the  re¬ 
cords  identified  by  the  query  are  returned  directly  to  the  user  or  user  program. 

If  an  aggregate  operator  is  specified  on  the  attribute  in  the  target-list,  some 
computation  is  to  be  performed  on  all  the  attribute  values  in  the  records  identi¬ 
fied  by  the  query  and  a  single  aggregate  value  is  returned  to  the  user  or  user  pro¬ 
gram.  The  fourth  part  of  the  request,  referred  to  as  the  BY-clause,  is  optional 
as  designated  by  the  square  brackets  around  it.  The  use  of  the  By-clause  is  ex¬ 
plained  by  means  of  an  example.  Assume  that  employee  records  are  to  divided  into 
groups  on  the  basis  of  the  departments  for  the  purpose  of  calculating  the  average 
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salary  for  all  the  employees  In  a  department.  This  may  be  achieved  by  using  a  re 
trieve  request  with  the  specific  target  list,  (AVG (Salary) ) ,  and  the  specific  BY- 
clause,  BY  Department.  Finally,  the  fifth  part  of  the  request,  which  is  also 
optional,  is  a  WITH-clause  which  specifies  whether  pointers  to  the  retrieved  re¬ 
cords  must  be  returned  to  the  user  or  user  program  for  later  use  in  an  update 
request.  Some  examples  of  retrieve  requests  are  presented  below. 

Example  1.  Retrieve  the  names  of  all  employees  who  work  in  the  Toy 
Department . 

RETRIEVE  (File= EMPLOYEE)  A  (Dept=TOY)  (Name) 

Example  2.  Retrieve  the  names  and  salaries  of  all  employees  making  more 
than  $5000  per  year. 

RETRIEVE  (File= EMPLOYEE)  A  (Salary>5000)  (Name, Salary) 

Example  3.  Find  the  average  salary  of  an  employee. 

RETRIEVE  (File=EMPLOYEE)  (AVG (Salary)) 

Example  4.  List  the  average  salary  of  all  departments. 

RETRIEVE  (File=EMPLOYEE)  (AVG (Salary))  BY  Department 

B.  Insert 

The  syntax  of  an  insert  request  is: 

INSERT  Record 

where.  Record  is  the  record  to  be  inserted  into  the  database.  An  example  of 
an  insert  command  is : 

INSERT  (<Relat ion , EMPL0YEE> , < Salary , 5000> , <Dept , T0Y> ) 

C.  Delete 

The  syntax  of  a  delete  request  is: 

DELETE  Query 

where  Query  is  a  query  which  specifies  the  particular  records  to  be  deleted 
from  the  database.  An  example  of  a  DELETE  request  is: 

DELETE  (Name=HSIAO)  V  (Salary>50) 

D.  Update 

In  DML,  the  syntax  of  an  update  request  is: 

UPDATE  Query  Modifier 

where  Query  specifies  the  particular  records  from  the  database  to  be  updated 
and  Modifier  specifies  the  kinds  of  modification  that  need  to  be  done  on  re- 
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cords  that  satisfy  the  query.  An  update  in  MDBS  which  provides  for  the  modifica¬ 
tion  of  only  a  single  attribute  value  will  have  the  attribute  referred  to  as  the 
attribute  being  modified .  The  modifier  in  an  update  request  specifies  the  new 
value  to  be  taken  by  the  attribute  being  modified.  The  new  value  to  be  taken  by 
the  attribute  being  modified  is  specified  as  a  function  f  of  the  old  value  of 
either  that  attribute  (i.e.,  Type-I)  or  some  other  attribute  (Types  IT,  III  and  IV) 
The  modifier  may  be  one  of  the  following  five  types: 

Type-0:  <attribute=constant> 

Type-I:  ottribute^f  (attribute)  > 

Type-II:  <attribute=f (attributel)> 

Type-Ill:  <attribute*f (attributel)  of  Query> 

Type-IV:  <attribute"f (attributel)  of  Pointer> 

Let  a  record  being  modified  be  referred  to  as  the  record  being  modified. 

Then,  a  Type-0  modifier  sets  the  new  value  of  the  attribute  being  modified  to 
a  constant.  A  Type-I  modifier  sets  the  new  value  of  the  attribute  being  modi¬ 
fied  to  be  some  function  of  its  old  value  in  the  record  being  modified.  A 
Type-II  modifier  sets  the  new  value  of  the  attribute  being  modified  to  be  some 
function  of  some  other  attribute  value  in  the  record  being  modified.  A  Type-III 
modifier  sets  the  new  value  of  the  attribute  being  modified  to  be  some  function 
of  other  attribute  value  in  another  record  uniquely  identified  by  the  query  in 
the  modifier.  Finally,  a  Type-IV  modifier  sets  the  new  value  of  the  attribute 
being  modified  to  be  some  function  of  some  other  attribute  value  in  another 
record  identified  by  the  pointer  in  the  modifier.  An  example  of  a  Type-0 
modifier  Is: 

<Salary“5000> 

This  sets  the  salary  of  all  the  records  being  updated  to  5000. 

An  example  of  a  Type-1  modifier  is: 

<  Salary-1 . lxSalary> 

This  raises  the  salary  of  all  qualifying  employees  by  10%. 

An  example  of  a  Type-II  modifier  is: 

<Monthsal«Yearsal/12> 

This  sets  the  monthly  salary  of  all  qualifying  employees  to  be  a  twelfth  of 
their  own  yearly  salaries. 

An  example  of  a  Type-III  modifier  is: 

<Salary«*Salary  of  (Relation»WIFE)  A  (Name*TARA)>. 

An  example  of  a  Type-IV  modifier  is: 

<Salary*Salary  of  2000> 


I 


which  modifies  the  salary  of  all  qualifying  employees  to  that  of  the  record 
stored  in  location  2000.  In  order  to  use  this  type  of  modifier,  the  user  must 
have  previously  issued  a  retrieve  request  with  the  WITH  POINTER  option. 

An  example  of  a  complete  update  request  may  be: 

UPDATE  (File=EMPL0YEE)  <Salary=Salary+25> 
which  gives  a  raise  of  $25  to  all  employees. 

3.2.3  The  Notion  of  a  Transaction 

In  DML,  we  allow  the  flexibility  for  a  user  to  specify  a  set  of  requests 
for  repeated  execution.  Such  a  pre-specif ied  set  of  requests  shall  be  referred 
to  as  a  transaction.  As  in  other  systems,  a  transaction  must  preserve  consis¬ 
tency.  i.e.,  uphold  the  truthhood  of  the  assertions  made  about  the  database. 

A  database  creator  specifies  a  set  of  assertions  on  the  database.  These  as¬ 
sertions  are  constraints  which  must  be  satisfied  by  data  in  the  database.  For 
instance,  since  employees  cannot  have  negative  salaries,  an  assertion  on  the 
database  may  require  that  all  employees  have  positive  salaries.  An  assertion 
about  a  database  is  said  to  be  true  in  the  database  if  the  data  in  the  data¬ 
base  satisfies  the  constraints  in  the  assertion.  A  database  is  in  a  consistent 

state  if  all  the  assertions  made  on  the  database  by  the  database  creator  remain 
to  be  true  in  the  database.  Finally,  a  transaction  is  said  to  preserve  consis¬ 
tency,  if  the  database  is  in  a  consistent  state  before  it  begins  execution  and 
the  database  is  in  a  consistent  state  after  it  finishes  execution. 

Some  examples  of  transactions  in  our  environment  are  present  below.  We 
begin  each  transaction  with  BOT,  an  acronym  for  beginning-of-transaction  and  we 
terminate  each  transaction  with  EOT  for  end-of-transaction. 

Example  1.  We  assume  the  existence  of  two  files  in  a  database  [Eswa76], 
one  called  accounts  and  the  other  called  assets  depicted  in 
Figure  12.  The  only  assertion  made  by  the  database  creator 
on  this  database  is  that  the  total  assets  at  a  particular  lo¬ 
cation  are  equal  to  the  sum  of  the  balances  at  that  location. 

We  Assume  that  the  record 
(<Location,NAPA>,  <Number,5320>,  <Balance,287>) 

Is  to  be  Inserted  repeatedly  into  the  database.  Each  time  such 
an  insertion  takes  place,  the  sum  of  the  balances  at  location 
Napa  is  increased  by  $287.  Hence,  if  the  assertion,  that  the 
sum  of  the  balances  at  a  location  is  equal  to  the  total  assets 


ACCOUNTS 


ASSETS 


Location 

Number 

Balance 

Location 

Total 

NAPA 

32123 

1050 

ST  HELENA 

506 

ST  HELENA 

36592 

506 

NAPA 

1337 

NAPA 

5320 

287 

Figure  12.  A  Sample  Database  of  Two  Files 
(Adopted  from  [Eswa76]) 


at  that  location,  is  to  be  preserved,  then  the  total  for  the 
location  Napa  in  the  assets  file  must  also  be  increased  by 
$287.  A  transaction  that  does  this  is: 


BOT 

INSERT  (<Location,NAPA>,  <Number,5320>,  <Balance,287>) 

UPDATE  (File=ASSETS)  A  (Location-NAPA)  <Total=Total+287> 

EOT 

Example  2:  From  time  to  time,  employee  Smith  is  given  a  raise  of  $50. 

An  assertion  on  the  database  may  require  that  employee  Jones 
always  makes  the  same  salary  as  employee  Smith.  Hence,  the 
salary  of  Jones  must  also  be  raised  by  $50  whenever  the  salary 
of  Smith  is  raised  by  $50.  A  transaction  that  does  this  is: 

BOT 

UPDATE  (File=EMPLOYEE)  A  (Name=SMITH)  < Salary=Salary+50> 

UPDATE  (File=EMPLOYEE)  A  (Name*1  JONES)  <Salary=Salary+50> 

EOT 

3.2.4  Basic  Requests  vs.  Aggregate  Requests 

We  have  thus  far  described  the  various  basic  types  of  requests  of  MDBS. 
Detailed  algorithms  for  the  execution  of  the  basic  request  types  will  be  pre¬ 
sented  in  Chapter  4,  although,  in  Chapter  2  we  had  briefly  described  the  execu¬ 
tion  sequence  of  the  retrieve  requests  in  MDBS.  There,  we  had  stated  that  a  re¬ 
trieve  request  would  be  initially  broadcast  from  the  controller  to  all  the  back¬ 
ends  over  a  bus  and  that  the  results  would  be  output  by  each  back-end  to  the  con¬ 
troller  via  the  same  bus.  We  will  elaborate  on  the  performance  and  the  execution 
sequences  of  all  the  basic  types  of  requests  in  Chapter  4. 

When  an  aggregate  operator  is  included  in  a  request,  the  execution  of  the 
request  takes  additional  complexity,  since  considerable  computations  must  be  per¬ 
formed  by  MDBS  on  data  aggregates  for  the  request.  For  this  reason,  we  study  re¬ 
quests  employing  aggregate  operators.  From  now  on,  we  will  refer  to  requests 
using  aggregate  operators  as  aggregate  requests.  Furthermore,  we  will  analyze  the 
per Formance  of  MDBS  in  the  execution  of  aggregate  requests  in  Chapter  4. 

3.2.5  Tn  Meeting  the  Selection  Criteria 

In  the  beginning  of  this  chapter,  we  stated  three  criteria  which  were  to  be 
satisfied  by  a  data  model  and  the  data  manipulation  language  of  the  data  model. 
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The  three  criteria  were  intended  for  the  selection  of  an  ideal  data  model  and 
its  data  manipulation  language  for  MDBS  implementation.  These  criteria  were  re¬ 
ferred  to  as  the  translation  criterion,  the  partition  criterion  and  the  language 
criterion. 

The  attribute-based  model  satisfies  the  selection  criteria  as  we  amply  ar¬ 
gued  in  the  preceeding  sections.  In  this  section,  we  consider  its  data  manipu¬ 
lation  language,  DML,  in  terms  of  the  selection  criteria. 

Of  the  three  criteria,  the  partition  criterion  is  to  be  met  by  the  model 
alone.  Hence,  we  do  not  need  to  consider  this  criterion  for  DML.  We  shall 
consider  the  remaining  criteria  for  DML  in  the  following  paragraphs. 

In  [Bane80] ,  it  was  demonstrated  that  the  data  manipulation  languages  of 
the  three  prevailing  data  models  -  network,  hierarchical  and  relational  -  could 
be  translated  into  the  data  manipulation  language  of  a  database  computer,  known 
as  DBC.  The  data  manipulation  language  proposed  in  this  chapter,  i.e.,  DML, 
properly  contains  all  the  features  of  the  data  manipulation  language  of  DBC. 
Hence,  if  the  data  manipulation  languages  of  the  three  prevailing  data  models  can 
be  translated  into  the  data  manipulation  language  of  DBC,  they  can  also  be  trans¬ 
lated  into  DML.  Thus,  DML  satisfies  the  translation  criterion. 

Finally,  DML  allows  for  the  concurrent  access  of  large  volumes  of  data  be¬ 
cause  it  enables  the  user  to  specify  queries  in  terms  of  boolean  expression  of 
predicates.  Consequently,  DML  satisfies  the  language  criterion. 

In  conclusion,  the  proposed  DML  satisfies  all  the  criteria  indicated  in  the 
beginning  of  this  chapter. 
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4.  THE  PROCESS  OF  REQUEST  EXECUTION 

In  Chapter  3,  we  had  described  the  data  manipulation  language,  DML,  used  in 
MDBS.  In  this  Chapter,  we  will  describe  the  process  of  DML  request  execution 
from  the  time  a  DML  request  is  first  received  by  MDBS  to  the  time  the  response 
set  of  the  request  is  returned  to  the  user  or  user  program. 

Central  to  the  discussion  on  request  execution  is  the  notion  of  a  cluster. 

This  notion  will  be  developed  in  Section  4.1.  Here,  we  give  a  brief  introduction 
to  the  process  of  request  execution.  We  first  utilize  the  descriptors  defined 
in  Chapter  3  as  an  equivalence  relation  to  partition  the  database  into  equiva¬ 
lence  classes  which  are  termed  clusters.  The  equivalence  relation  guarantees 
the  following  nice  properties:  Every  record  in  the  database  belongs  to  one  and 
only  one  cluster  of  the  database.  By  proper  use  of  the  descriptors,  the  clusters 
may  be  formed  in  such  a  way  that  if  a  user  needs  to  access  a  record  belonging  to 
a  cluster,  the  user  is  most  likely  to  have  the  need  to  access  all  the  other  re¬ 
cords  belonging  to  that  cluster.  Thus,  clusters  serve  as  the  basic  units  of  ac¬ 
cess  in  MDBS  and  every  database  is  stored  in  MDBS  as  a  collection  of  clusters. 

The  execution  of  a  user  request  in  MDBS  proceeds  typically  in  several  distinct 
phases  as  follows.  In  the  first  phase,  MDBS  will  determine  the  exact  clusters 
of  records  which  will  satisfy  the  request.  In  the  second  phase,  MDBS  accesses 
security  information  about  the  user  in  order  to  select  for  the  user  the  author¬ 
ized  clusters  among  the  clusters  which  have  been  determined  in  the  first  phase. 

In  the  third  phase,  MDBS  determines  the  secondary  memory  addresses  of  the  author¬ 
ized  clusters  selected  in  the  second  phase.  In  all  these  three  phases,  MDBS 
makes  n£  access  to  the  records  of  the  clusters  selected.  Instead,  MDBS  utilizes 
auxiliary  information  about  the  clusters  and  security.  Such  utilization  of  auxi¬ 
liary  information  constitutes  the  directory  management  function  of  MDBS.  After 
the  three  phases  of  directory  management,  MEBS  retrieves  the  clusters  of  records 
which  will  satisfy  the  request. 

The  use  of  the  data  placement  strategy  in  MDBS  will  ensure  us  that  each 
cluster  (database)  is  stored  in  such  a  way  that  the  records  (clusters)  of  the 
cluster  (database)  are  evenly  distributed  across  the  multiple  back-ends  of  MDBS. 

Since  all  clusters  are  evenly  distributed  across  the  back-ends,  the  response  set 
of  the  request  will  be  retrieved  in  a  parallel  fashion  from  the  back-ends. 

This  completes  our  brief  description  of  the  processing  of  a  request  in  MDBS.  The 
remaining  sections  of  this  Chapter  are  organized  as  follows.  In  Section  4.1,  ve 
will  develop  the  notion  of  a  cluster.  The  data  structures  to  be  used  for  deter- 


-79- 


mining  the  set  of  clusters  which  will  satisfy  a  request,  and  the  algo¬ 
rithms  necessary  for  such  determination  are  also  described  in  Section 
4.1.  In  keeping  with  our  findings  in  Chapter  2,  each  cluster  is  placed 
according  to  the  track-splitting-with-random  placement  policy  which  was 
shown  to  be  superior.  In  Section  4.2,  we  will  describe  two  of  the  three 
phases  of  the  directory  management.  Phase  2  is  dealing  with  security- 
related  directory  management.  Because  of  its  importance  and  our  new  con¬ 
tribution,  it  is  discussed  in  Chapter  5.  Several  strategies  for  perform¬ 
ing  directory  management  are  proposed  and  evaluated  on  the  basis  of  a 
criteria  which  strives  for  minimal  processing  time.  These  same  strategies 
are  also  evaluated,  in  Section  4.2,  in  terms  of  storage  requirements.  At 
the  end  of  Section  4.2,  we  will  present  our  recommendations  for  a  direc¬ 
tory  management  strategy  for  MDBS.  In  Section  4.3,  we  will  describe  the 
execution  sequences  for  the  various  DML  requests  that  may  be  issued  to 
MDBS. 


4 . 1  The  Notion  of  Record  Clusters 

Record  clusters  are  formed  for  the  purposes  of  narrowing  the  search  space 
and  minimizing  the  effort  needed  to  retrieve  records  which  may  satisfy  a  given 
request.  In  other  words,  by  organizing  a  database  into  clusters  and  by  maintain¬ 
ing  information  about  these  clusters,  MDBS  may  readily  identify  those  clusters 
whose  records  will  satisfy  the  given  request,  thereby  achieving  high  throughput 
and  good  response  time. 

Although  the  notion  of  a  record  cluster  for  the  above-mentioned  purposes 
is  well  known,  the  effectiveness  of  clusters  for  throughput  gain  and  response¬ 
time  improvement  lies  in  the  effectiveness  of  the  clustering  algorithm  for  forming 
clusters  and,  more  importantly,  the  placement  strategy  for  storing  these  clusters. 
In  other  words,  it  depends  on  how  clusters  are  formed  and  placed.  Interestingly 

enough,  it  does  not  depend  on  how  clusters  are  used.  In  other  words,  the  through¬ 
put  and  response  time  of  MDBS  are  ’immune’  from  the  way  the  clusters  are  utilized. 

This  is  because  every  request  execution  by  MDBS  will  involve  the  search  and  re- 
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trieval  of  clusters.  Such  search  and  retrieval  can  always  be  shown  to  be 
maximal  for  throughput  gain  and  response-time  Improvement.  We  will  comment 
on  this  point  briefly  herein  and  elaborate  on  the  point  thoroughly  in  the 
later  sections.  Briefly,  this  is  due  to  our  use  of  the  descriptors  as  a  means 
to  define  and  form  clusters.  As  we  recall  from  Chapter  3,  a  descriptor  is 
either  a  single  predicate  or  a  conjunction  of  predicates.  We  may  also  re¬ 
call  that  a  query  in  a  user  request  is  a  boolean  expression  of  predicates. 
Thus,  a  given  user  request  will  require  the  retrieval  of  data  which  sati¬ 
sfy  the  predicates  of  the  expression.  Since  clusters  are  formed  by  the  de¬ 
finition  of  descriptors  and  both  descriptors  and  queries  utilize  the  com¬ 
mon  notion  of  predicates,  the  data  retrieved  for  the  request  are  actually 
one  or  more  clusters.  Clusters  therefore  become  the  ideal  formation  (or 
unit)  of  data  for  storage  and  retrieval  and  for  performance  optimization. 

Our  work  on  clusters  is  unique  also  in  a  couple  of  respects.  First, 
the  clustering  algorithms  in  our  method  will  be  executed  in  multiple  back¬ 
ends  rather  than  at  a  single  back-end  as  in  all  other  work.  Second,  we 
recongnize  the  importance  which  must  be  given  to  the  placement  of  these 
clusters  across  the  multiple  back-ends  of  MDBS. 

In  the  following  sections,  we  will  describe  how  the  clusters  are  formed 
in  MDBS  and  how  they  are  used.  We  will  begin  with  some  definitions. 


4.1.1  Cluster  Formation 

For  a  database,  the  creator  of  the  database  specifies  a  number  of  descriptors 
called  clustering  descriptors,  or  simply,  descriptors.  An  attribute  that  appears 
in  a  descriptor  is  called  a  directory  attribute.  We  say  that  a  directory  attri¬ 
bute  belongs  to  a  descriptor  if  the  attribute  appears  in  that  descriptor. 

We  recall  that  a  record  consists  of  attribute-value  pairs  or  keywords.  For 
purposes  of  clustering,  only  those  keywords  of  the  record  which  contain  directory 
attributes  are  considered.  Such  keywords  of  the  record  are  termed  directory  key¬ 
word  s .  From  the  rules  for  forming  descriptors  specified  in  Chapter  3,  it  is  easy 
to  see  that  a  directory  keyword  is  derivable  from  at  most  one  descriptor. 

For  example,  consider  a  database  with  Salary  as  the  only  directory  attribute. 
Furthermore,  let  (0<Salary2500)  be  the  only  descriptor  D1  on  Salary  specified  by 
the  database  creator.  Now,  consider  two  records,  one  containing  the  directory 
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keyword  <Salary,250>  and  the  other  containing  the  directory  keyword  <Salary,750> . 
Clearly,  the  former  directory  keyword  is  derivable  from  descriptor  D1  and  the 
latter  directory  keyword  is  not  derivable  from  Dl.  Hence,  the  latter  keyword  is 
not  derivable  from  any  descriptor  in  the  database  and  we  say  that  the  directory 
keyword  is  derivable  from  no  descriptor.  Since  a  record  may  have  many  directory 
keywords,  each  of  which  will  be  derivable  from  at  most  one  descriptor,  we  say 
that  the  record  is  derived  from  a  set  of  descriptors.  It  is  possible  for  a  re¬ 
cord  to  be  derived  from  the  empty  set  of  descriptors.  There  are  two  such  cases. 

In  the  first  case,  it  may  happen  that  a  record  does  not  contain  any  directory 
keyword.  In  this  case,  it  is  said  that  the  record  is  derived  from  the  empty 
set  of  descriptors.  Thus,  going  back  to  the  previous  example  with  the  single  di¬ 
rectory  attribute,  Salary,  and  the  single  descriptor,  (05SalaryS500) ,  a  record 
which  does  not  contain  any  salary  information  (i.e.,  no  keyword  with  the  attribute 
Salary)  is  said  to  be  derived  from  the  empty  set  of  descriptors.  The  second 
case  in  which  a  record  is  derived  from  the  empty  set  of  descriptors  is  when 
the  record  does  indeed  contain  directory  keywords,  but  these  keyword s  are  not 
derivable  from  any  descriptors.  In  the  previous  example,  a  record  with  the 
directory  keyword  <Salary,750>  which  is  not  derivable  from  the  descriptor  is 
therefore  derived  from  the  empty  set  of  descriptors  also. 

If  two  records  are  derived  from  the  same  set  of  descriptors,  they  are  likely 
to  be  retrieved  together  in  response  to  a  user  request,  since  these  two  records 
have  keywords  which  are  derived  from  the  same  set  of  descriptors.  Thus,  these 
two  records  should  be  stored  together  in  the  same  cluster.  A  cluster  is,  there¬ 
fore,  a  group  of  records  such  that  every  record  in  the  cluster  is  derived  from 
the  same  set  of  descriptors.  We  say  that  a  record  cluster  is  defined  by  the  set 
of  descriptors  from  which  all  records  in  the  cluster  are  derived. 

It  is  easy  to  see  that  a  record  belongs  to  one  and  only  one  cluster.  The 
reasoning  is  as  follows.  A  record  consists  of  zero  or  more  directory  keywords. 

If  it  consists  of  zero  directory  keywords,  it  belongs  to  the  cluster  defined  by 
the  empty  set  of  descriptors.  If  the  record  consists  of  one  or  more  directory 
keywords,  then,  the  record  must  be  derived  from  one  and  only  one  set  of  descrip¬ 
tors,  since  each  directory  keyword  is  derived  from  at  roost  one  descriptor.  This 
unique  set  of  descriptors  defines  the  unique  cluster  to  which  the  record  belongs. 
Thus,  we  have  used  the  concept  of  descriptor  sets  to  partition  the  database 
into  equivalence  classes,  namely  clusters.  A  formal  proof  of  the  above  obser¬ 
vations  is  included  in  an  Appendix  8. 

In  order  to  form  clusters  for  the  records  in  a  database,  an  algorithm  is 


provided  herein  which  will  take  a  record  and  determine  its  cluster.  We  will 
describe  this  algorithm  informally  below.  The  detailed  algorithm  to  be  im¬ 
plemented  in  MDBS  for  cluster  formation  will  be  presented  in  Appendix  C. 

The  algorithm  for  determining  the  cluster  to  which  a  record  belongs  is  as 
follows.  For  each  attribute-value  pair  in  the  record,  determine  if  the  at¬ 
tribute  is  a  directory  attribute.  If  it  is  not,  then  that  attribute-value  pair 
is  not  used  for  cluster  determination.  If  the  attribute  is  a  directory  attri¬ 
bute,  determine  the  descriptor,  if  any,  from  which  it  is  derived.  We  refer  to 
this  descriptor,  if  any,  as  the  corresponding  descriptor  for  the  given  attribute- 
value  pair.  The  set  of  corresponding  descriptors  for  all  the  attribute-value 
pairs  in  a  record  defines  the  cluster  to  which  the  record  belongs. 

By  using  the  algorithm  on  every  record  of  a  database  at  database-creation 
time,  we  may  form  the  record  clusters  of  the  database. 

4.1.2  An  Example  of  Cluster  Formation 

Consider  the  database  and  its  descriptors  shown  in  Figure  13.  The  figure 
shows  two  files,  accounts  and  assets.  The  accounts  file  has  three  records  and 
the  assets  file  has  two  records.  Also,  the  figure  shows  the  five  descriptors 
specified  on  this  database  by  the  database-creator.  The  attributes.  Number  and 
Balance,  are  type-A  descriptors  and  the  attribute.  Location,  is  a  type-C  descrip¬ 
tor.  This  type-C  descriptor  will  be  converted  into  two  type-B  descriptors  as 
shown  in  Figure  14.  There  Is  no  descriptor  for  the  attribute.  Total,  because  no  re 
quest  with  the  attribute.  Total,  as  part  of  a  query  is  expected.  The  clusters 
formed  for  this  database  and  the  records  in  each  cluster  are  depicted  in  Figure  15. 

The  first  row  in  Figure  15  is  for  cluster  1.  The  set  of  descriptors  defining 
this  cluster  is  {D2,  D3,  D5}.  Consider  record  R1  which  belongs  to  this  cluster. 

It  contains  the  keyword  <Location,  NAPA>  which  is  derived  from  the  descriptor  D5 
(Location=NAPA) .  Similarly,  the  keyword  <Number ,32123>  is  derived  from  D2  which 
is  (150005Number<m) .  Finally,  the  keyword  <Ba1 ance, 50>  is  derived  from  D3  which 
is  (05Balance<500) .  Hence,  R1  belongs  to  the  cluster  defined  by  D2,  D3  and  D5. 
Similarly,  it  may  be  shown  that  R3  belongs  to  the  same  cluster.  This  explains 
the  first  row  in  Figure  15.  With  similar  exercises,  we  may  verify  the  remaining 
three  rows  in  Figure  15. 


Accounts  file: 

(<Location,NAPA> ,  <Number,32123>,  <Balance,50>) 
(<Location,St.  Helena>,  <Number ,5320>,  <Balance,506>) 
(<Location,NAPA>,  <Number,36592>,  <Balance,287>) 


Assets  file: 

(<Location,St.  Helena>,  <Total,506>) 
(<Location,NAPA>,  <Total,337>) 


The  database  creator  specifies  the  following  descriptors: 

Type-A  Descriptors: 

(0  5  Number  <  15000) 

(15,000  5  Number  <  *) 

(0  5  Balance  <  500) 

(500  <  Balance  <  1000) 

Type-C  Descriptor:  Location 


*  For  simplicity,  we  refer  to  the  three  records  of  the  accounts 
file  as  Rl,  R2  and  R3,  respectively;  and  to  the  two  records  of 
the  assets  file  as  R4  and  R5,  respectively. 


i 


Figure  13.  A  Database  of  Two  Files  and 
its  Clustering  Descriptors 
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Descriptors  Specified  by  the  Database-Creator  as  seen  by  MDBS 


Descriptor 

Descriptor  id 

(0  -  Number  <  15,000) 

D1 

(15,000  -  Number  <  <*) 

D2 

(0  ^  Balance  <  500) 

D3 

(500  -  Balance <  1000) 

D4 

(Location  =  Napa) 

D5 

(Location =  St.  Helena) 

D6 

The  Descriptors  Formed  for  the 
Database  of  Figure  13 


Figure  14.  The  Descriptor-to-Descriptor-Td  Table  (DDIT) 


-85  - 


The  Clusters  Formed  for  the  Database  of  Figure  13 


Cluster 

Set  of  Descriptors* 

Records  in  the  Cluster  Defined 
by  the  Descriptor  Set** 

1 

{D2,  D3,  D5> 

Rl,  R3 

2 

{Dl,  D4,  D6> 

R2 

3 

{06} 

R4 

4 

{05} 

R5 

*  Actually,  only  descriptor  Ids  are  shown  here, Together  with  the 

Descrlptor-To-Descriptor-ld  Table  shown  in  Figure  14,  MDBS  maintains 
the  descriptor  sets. 

**  In  implementation,  this  column  contains  the  secondary  memory  addresses 
of  Rl,  R2,  R3,  R4  and  R5  with  two  addresses  in  the  first  row. 


Figure  15.  The  Cluster-Definition  Table  (CDT) 
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4.1.3  Clusters  Determination  During  Request  Execution 

Up  to  this  point,  we  have  been  describing  the  process  of  cluster  formation. 
We  will  now  explain  how  clusters  are  used  during  request  execution.  More  speci¬ 
fically,  we  will  explain  how  to  determine  the  cluster  to  which  a  new  record  be¬ 
longs  and  how  to  determine  the  set  of  clusters  which  must  be  retrieved  in  order 
to  satisfy  a  query  for  retrieval,  deletion  or  update. 

During  the  process  of  cluster  formation  described  in  the  previous  section, 
MDBS  uses  an  algorithm  repeatedly  for  determining  the  cluster  of  a  record  in  the 
database.  This  same  algorithm  may  now  be  used  by  MDBS  to  determine  the  cluster 
of  a  record  for  the  record  insertion.  In  insertion,  the  cluster  definition  table 
(CDT)  is  used  in  order  to  determine  the  secondary  memory  address  (addresses)  of 
this  cluster. 

Next,  let  us  describe  how  MDBS  determines  the  set  of  clusters  which  satisfy 
the  query  in  a  retrieval,  deletion  or  update  request.  Before  we  may  do  this,  we 
must  introduce  some  concepts  and  terminology. 

Descriptor  X  is  defined  to  be  less  than  descriptor  Y,  if  the  attributes  in 
both  descriptors  are  the  same  and  one  of  the  following  holds. 

(1)  Both  descriptors  are  of  type-A  and  the  upper  limit  of  descriptor  X  is 
lower  than  the  lower  limit  of  descriptor  Y. 

(2)  Both  descriptors  are  of  type-B  and  the  value  in  descriptor  X  is  smaller 
than  the  value  in  descriptor  Y. 

(3)  Descriptor  X  is  of  type-A  and  descriptor  Y  is  of  type-B  and  the  upper 
limit  of  descriptor  X  is  lower  than  the  value  in  descriptor  Y. 

(4)  Descriptor  X  is  of  type-B  and  descriptor  Y  is  of  type-A  and  the  value 
in  descriptor  X  is  smaller  than  the  lower  limit  of  descriptor  Y. 

An  exactly  parallel  description  for  the  gr eater-than  relation  among  descrip¬ 
tors  may  also  be  given.  The  above  definition  covers  the  case  where  either  X  or 
Y  is  a  type-C  descriptor,  since  type-C  descriptors  are  stored  as  type-B  descrip¬ 
tors  in  MDBS. 

To  illustrate  the  definition  of  less-than  among  descriptors,  let  us  assume 
that  we  are  given  the  descriptors  Dl  (1005Salary<200) ,  D2  (0<Salary<99) , 

D3  (Salary=99)  and  D4  (Salary®200) .  Thus,  D3  is  less  than  Dl;  D2  is  less  than 
D3;  and  Dl  is  less  than  D4.  The  relation,  less-than,  is  transitive.  Hance,  we 
can  define  a  partial  ordering  among  descriptors.  For  example,  the  ordering 
among  these  four  descriptors  is  D2,  D3,  Dl  and  D4. 

Using  the  above  definition  of  less-than  and  gr eater-than  for  the  descriptors, 
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we  are  ready  to  describe  the  algorithm  for  determining  the  corresponding  set  of 
clusters  for  a  query  in  a  user  request.  The  query  is  assumed  to  be  in  disjunctive 
normal  form,  i.e.,  disjunction  of  conjunctions.  The  algorithm  will  proceed  in 
three  steps. 

Since  a  query  conjunction  consists  of  predicates,  we  will  determine,  in  the 
first  step,  a  corresponding  descriptor  or  a  corresponding  set  of  descriptors  for 
each  predicate.  This  is  done  as  follows.  If  the  predicate  in  a  query  conjunction 
is  an  equality  predicate,  then  the  corresponding  descriptor  is  the  one  from  which 
the  keyword  satisfying  the  predicate  is  derived.  For  example,  if  the  predicate 
is  Location“NAPA,  then  the  keyword  satisfying  the  predicate  is  <Location,NAPA> 
and  the  corresponding  descriptor  is  (Location“NAPA) .  If  the  predicate  is  either 
a  less-than  or  less-than-or-equal-to  predicate,  it  is  first  treated  as  an  equal¬ 
ity  predicate  and  the  corresponding  descriptor  D  for  that  equality  predicate  is 
first  determined.  Then,  all  the  descriptors  less  than  D,  along  with  D,  form 
the  corresponding  set  of  descriptors  for  the  less-than  or  less-than-or-equal-to 
predicate.  If  the  predicate  is  a  greater-than  or  greater-than-or-equal-to  pre¬ 
dicate,  then  it  is  first  treated  as  an  equality  predicate  and  the  corresponding 
descriptor  D  for  that  equality  predicate  is  first  determined.  Then,  all  the 
descriptors  greater  than  D,  along  with  D,  form  the  corresponding  set  of  descrip¬ 
tors  for  the  greater-than  or  greater-than-or-equal-to  predicate.  Thus,  we  have 
determined  a  corresponding  set  of  descriptors  for  a  predicate. 

The  above  procedure  is  repeated  for  every  predicate  in  the  query  conjunc¬ 
tion.  Thus,  we  will  have  determined  a  corresponding  set  of  descriptors  for 
every  predicate  in  a  query  conjunction. 

Our  next  step  is  to  determine  the  corresponding  set  of  clusters  for  a 
query  conjunction,  since  a  query  consists  of  one  or  more  query  conjunctions.  Let 
the  query  conjunction  have  p  predicates.  Let  the  set  of  descriptors  corres¬ 
ponding  to  the  i-th  predicate  be  S^.  Now,  form  all  possible  groups,  where  each 
group  consists  of  one  descriptor  from  for  i  ranging  from  1  to  p.  In  other 
words,  we  are  forming  the  cross-product  of  S^.  The  reason  for  forming  this  cross- 
product  of  p  sets  is  because  a  query  conjunction  consists  of  a  conjunction  of  p 
predicates,  each  of  which  has  a  corresponding  set  Sj  of  descriptors.  Each  element 
in  this  cross-product  is  termed  a  descriptor  group  which  is  of  course  a  set  of 
descriptors.  Intuitively,  a  group  defines  a  set  of  clusters  whose  records  satisfy 
the  query  conjunction. 

We  recall  that  MDBS  maintains  a  table,  known  as  the  cluster  definition  table. 
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which  is  created  at  the  database  creation  time.  (See  Figure  15  again  for  an 
example).  However,  the  definitions  kept  in  the  table  may  not  be  identical  to 
the  definitions  of  the  groups.  Without  relating  the  descriptor  groups  with 
the  descriptor  sets  kept  in  the  table,  we  may  not  be  able  to  determine  the 
clusters  involved.  Thus,  this  second  step  includes  the  determination  of  whether 
there  are  descriptor  sets  in  the  table  which  contain  a  descriptor  group.  If 
there  are  such  sets,  then  the  clusters  defined  by  the  descriptor  sets  are  in¬ 
deed  the  clusters  referred  to  by  the  descriptor  group. 

By  repeating  this  procedure  for  every  descriptor  group  in  the  cross-product, 
we  are  able  to  determine  the  corresponding  set  of  clusters  for  a  query  con¬ 
junction.  The  entire  second  step  which  is  used  to  determine  the  corresponding 
set  of  clusters  for  a  query  conjunction  is  then  repeated  for  every  query  con¬ 
junction  in  the  query.  Thus,  we  have  determined  a  corresponding  set  of  clus¬ 
ters  for  every  query  conjunction  in  the  query. 

The  final  step  of  the  algorithm  determines  the  corresponding  set  of  clusters 
for  the  query  from  the  corresponding  set  of  clusters  for  each  query  conjunction 
in  the  query.  Since  the  query  is  a  disjunction  of  conjunctions,  the  correspond¬ 
ing  set  can  be  simply  obtained  as  the  union  of  the  sets  of  clusters  for  each 
query  conjunction  in  the  query. 

The  three  steps  involved  in  this  algorithm  are  illustrated  with  an  example 
in  the  following  section  and  formally  specified  in  Appendix  C. 

4.1.4  An  Example  of  Clusters  Determination  During  Request  Execution 

Our  example  is  developed  around  the  database  and  descriptors  of  Figures  13 
and  14.  Consider  that  the  following  retrieve  request  is  received  by  MDBS. 

RETRIEVE  ((Location=St. Helena)  A  (Balance=506) )  V  (Number<5500) (Balance) 

Clearly,  the  corresponding  descriptor  for  the  predicate  (Location*St.  Helena)  is 
D6  and  that  for  the  predicate  (Balance*506)  is  D4.  Similarly,  the  corresponding 
descriptor  for  the  predicate  (Number<5500)  is  Dl.  We  complete  the  first  step  of 
the  algorithm  for  determining  the  corresponding  set  of  descriptors  for  each  pre- 
d icate . 

In  the  next  step,  we  need  to  determine  the  corresponding  set  of  clusters 
for  each  query  conjunction.  Consider  the  first  query  conjunction,  (Location® 

St.  Helena)  A  ( Balance* 506 ) .  The  only  descriptor  group  that  can  be  formed  for 
this  query  conjunction  is  {D6,  D4}.  In  searching  the  entries  of  the  descriptor 
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definition  table  depicted  in  Figure  15,  we  discover  that  there  is  only  one 
descriptor  set  which  contains  the  descriptor  group.  It  is  the  set  {Dl,  D4,  D6}. 

The  cluster  defined  by  {Dl,  D4,  D6},  i.e.,  cluster  2,  is  the  only  member  of  the 
corresponding  set  of  clusters  for  the  first  query  conjunction.  Similarly,  we 
determine  the  corresponding  set  of  clusters  for  the  second  query  conjunction, 
(Number<5500) .  It  so  happens  that  cluster  2  is  also  the  only  member  of  the  correspond¬ 
ing  set  of  clusters  for  the  conjunction.  Thus,  we  complete  the  second  step  of 
the  algorithm. 

In  the  final  step  of  the  algorithm,  the  union  of  all  members  of  the  corres¬ 
ponding  sets  of  clusters  of  the  query  conjunctions  is  still  cluster  2.  Thus, 
cluster  2  constitutes  the  only  member  of  the  corresponding  set  of  clusters  for 
the  given  query.  Once  the  corresponding  set  of  clusters  is  determined,  the 
addresses  of  the  records  in  the  clusters  may  be  used  for  access  to  the  secondary 
memory . 


4.2  Directory  Management 

The  entire  sequence  of  actions  taken  by  MDBS  from  the  time  a  record-insertion 
request  is  received  to  the  time  that  the  cluster  to  which  the  record  is  to  be  in¬ 
serted  Is  determined  (i.e.,  the  secondary  memory  address  or  addresses  are  generated) 
is  referred  to  as  directory  management  for  an  insert  request.  Similarly,  the  en¬ 
tire  sequence  of  actions  taken  by  MDBS  from  the  time  a  retrieve,  delete  or  update 
request  is  received  to  the  time  the  corresponding  set  of  clusters  for  the  query  in 
the  request  and  their  addresses  in  the  secondary  memory  are  generated  is  referred 
to  as  directory  management  for  a  non- insert  request.  Together,  they  constitute  the 
directory  management  of  MDBS. 

We  repeat  that  the  directory  management  in  MDBS  consists  of  three  major 
phases.  In  the  first  phase,  MDBS  determines  the  exact  clusters  of  records 
which  will  satisfy  the  user  request.  This  phase  was  described  in  the  previous 
section.  The  algorithm  used  in  this  phase  for  determining  the  clusters  was 
briefly  described  in  the  previous  section  and  in  detail  in  an  appendix.  In  the 
second  phase,  MDBS  accesses  security  information  about  the  user  in  order  to  se¬ 
lect  for  the  user  the  authorized  clusters  among  the  clusters  which  have  been 
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determined  in  the  first  phase.  This  discussion  is  relegated  to  Chapter  5.  In 
the  third  phase,  MDBS  determines  the  secondary  memory  addresses  of  the  author¬ 
ized  clusters  selected  in  the  second  phase. 

4.2.1  Two  phases  of  Processing  -  Descriptor  Processing  and  Address  Generation 

For  implementation  of  the  directory  management  function,  we  recall  that 

the  cluster  definition  table  CDT  is  used  to  determine  the  corresponding  set  of 
clusters  for  a  query.  At  the  same  time,  the  secondary  memory  addresses  of  the 
corresponding  set  of  clusters  may  also  be  found, since  these  addresses  are  pre¬ 
sent  in  the  third  column  of  CDT  (see  again  Figure  15).  However,  we  want  to 
ensure  that  only  the  secondary  memory  addresses  of  the  authorized  clusters  for 
a  user  are  utilized.  This  is  achieved  by  augmenting  CDT  of  Figure  15  with 
several  more  columns  of  security-related  information,  one  column  for  each  user 
of  the  database.  The  details  of  the  kinds  of  security-related  information 
maintained  for  each  user  are  given  in  Chapter  5.  Thus,  in  implementation  of 
the  directory  management  function,  the  three  phases  described  above  may  be 
elaborated  as  follows:  In  the  first  phase,  the  descriptor-to-descriptor-id 
table  DDIT  Is  Searched  to  determine  the  corresponding  descriptor  or  descriptors 
for  each  predicate  of  a  query  in  the  case  of  a  non-insert  request  and  for  each 
keyword  of  a  record  in  the  case  of  an  insert  request.  In  the  sequel,  we  shall 
refer  to  this  phase  as  the  descriptor  search  phase  and  we  shall  refer  to  the 
processing  performed  therein  as  descriptor  processing.  In  the  next  step,  the 
augmented  CDT  is  searched  and  the  corresponding  single  cluster  in  the  case  of 
an  insert  request  or  the  corresponding  set  of  clusters  in  the  case  of  a  non¬ 
insert  request  is  determined.  Once  the  cluster  or  cluster  set  is  determined, 
the  authorized  cluster  or  clusters  may  be  selected  on  the  basis  of  security- 
related  information  in  the  augmented  CDT.  By  searching  the  same  augmented 
CDT,  the  addresses  of  authorized  cluster (s)  can  be  found.  We  shall  refer  to 
this  step,  in  the  sequel,  as  the  address  generation  phase  and  we  shall  refer 
to  the  processing  performed  therein  as  address  generation.  Thus,  the  three 
phases  of  directory  management  are  now  consolidated  in  two. 

Since  the  descriptor  search  phase  and  the  address  generation  phase  are 
similar  for  both  insert  and  non- insert  requests,  in  the  sequel  we  shall  only 
consider  these  two  phases  for  non- insert  requests.  The  discussion  extends 
in  a  straightforward  manner  to  insert  requests. 

4.2.2  Processing  Strategies  for  Multiple  Back-ends 

In  previous  discussions,  we  make  no  distinction  whether  the  two  phases 
were  to  be  carried  out  in  a  single  computer  or  in  multiple  computers  (a  con¬ 
troller  and  several  back-ends).  It  is  now  necessary  to  discuss  how  these  two 
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phases  will  be  executed  in  the  controller  and  multiple  back-ends  of  MDBS. 

We  have  identified  for  MDBS  six  different  strategies  for  carrying  out  the 
descriptor  search  phase  in  the  multiple  back-ends  and  one  strategy  for 
carrying  out  the  descriptor  search  phase  in  the  controller*  thereby  a  total 
of  seven  strategies  are  identified  for  various  ways  of  carrying  out  the  descrip¬ 
tor  search  phase.  We  have  also  identified  two  strategies,  one  for  carrying  out 
the  address  generation  phase  in  the  controller,  the  other  in  the  back-ends. 

For  completeness,  we  also  consider  an  eighth  strategy  for  directory  management 
in  which  both  phases  are  carried  out  in  the  controller.  These  eight  strategies 
for  directory  management  are  proposed  and  evaluated  in  the  next  sections. 

Let  us  first,  however,  indicate  our  preference  for  a  strategy  in  which  the 
address  generation  phase  is  carried  out  in  multiple  back-ends  rather  than  in 
the  controller.  By  carrying  out  this  phase  in  the  back-ends,  MDBS  will  be  al¬ 
leviated  from  the  controller  limitation  problem.  Since  this  phase  deals  with  the 
generation  of  secondary  memory  addresses,  each  back-end  would  need  to  gener¬ 
ate  only  those  secondary  memory  addresses  associated  with  that  back-end.  On 
the  other  hand,  if  the  addresses  were  to  be  generated  by  the  controller,  the 
controller  would  need  to  generate  all  the  relevant  secondary  memory  addresses 
associated  with  the  entire  back-ends.  Thus,  it  is  easy  to  see  that  the  work 
of  address  generation  is  evenly  divided  up  among  the  back-ends  in  the  former 
case.  This  is  essential  if  we  are  to  achieve  an  ideal  system  in  which  the  re¬ 
sponse  time  is  inversely  proportional  to  the  number  of  back-ends.  This  con¬ 
cludes  our  discussion  of  our  preference  for  a  strategy  in  which  the  address 
generation  phase  is  carried  out  in  the  multiple  back-ends. 

We  note  that  the  address  generation  phase  actually  includes  all  security 
related  processing  also.  The  time  for  address  generation,  which  is  essentially 
the  time  for  searching  the  augmented  CDT,  will  depend  on  the  size  of  each  entry 
in  the  augmented  CDT.  Hence,  it  will  depend  on  whether  or  not  the  table  is 
augmented  with  security  information.  In  the  sequel,  our  analysis  will  assume 
that  no  security  information  is  contained  in  the  CDT.  There  are  two  reasons 
for  this  assumption. 

First,  we  wish  to  analyze  the  performance  of  an  MDBS  in  which  security  is 
not  enforced.  This  is  because  many  implementations  of  MDBS  may  wish  to  provide 
only  the  basic  database  management  functions  and  may  not  wish  to  provide  se¬ 
curity  enforcement. 

Second,  the  enforcement  of  security  will  not  affect  our  comparative  study 
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of  the  various  strategies,  since  all  the  back-ends  are  expected  to  spend  the 
same  amount  of  work  in  security-related  processing. 

We  now  proceed  to  discuss  the  eight  different  strategies  for  directory 
management  in  the  following  sections.  VJe  propose  various  strategies  for  dir¬ 
ectory  management  in  terms  of  descriptor  searching  and  address  generation. 

These  alternatives  will  be  evaluated  from  the  viewpoint  of  performance  and  sto¬ 
rage  requirements. 


A.  The  Centralized  Strategy 

In  this  strategy,  all  the  directory  management  is  done  at  the  controller. 

The  controller  maintains  the  descriptors  for  all  the  directory  attributes. 

In  other  words,  both  DDIT  and  the  augmented  CDT  are  stored  with  the  controller. 
Given  a  query,  the  controller  first  performs  the  descriptor  processing  and 
then  address  generation  by  utilizing  the  aforementioned  tables.  Eventually, 
a  set  of  secondary  memory  addresses  of  relevant  records  is  generated  at  the 
controller.  We  note  that  a  secondary  memory  address  consists  of  not  just  the 
track  and  cylinder  information  about  the  records  but  also  the  information  about 
the  back-end  in  which  the  track  and  cylinder  are  located.  None  of  the  remaining 
seven  strategies  will  need  the  back-end  information  in  a  secondary  memory  ad¬ 
dress.  This  is  because  all  the  remaining  strategies  will  do  the  address  gener¬ 
ation  at  the  respective  back-end  rather  than  at  the  controller. 

B.  The  Partially  Centralized  Strategy 

The  descriptor  processing  is  done  at  the  controller  in  this  strategy  as 
in  the  previous  strategy.  The  corresponding  descriptors  are  then  broadcasted 
to  all  the  back-ends.  Each  back-end  will  now  carry  out  the  address  generation 
phase  in  an  independent  fashion.  The  reason  why  we  expect  this  strategy  to  be 
superior  to  the  previous  strategy  is  two-fold.  First,  the  work  of  address 
generation  that  could  be  done  at  the  controller  is  now  distributed  to  the  back¬ 
ends.  This  should  alleviate  the  controller  limitation  problem.  Second,  the 
work  needed  for  address  generation  is  divided  in  such  a  way  that  a  back-end 
needs  only  to  generate  the  addresses  of  the  secondary  memory  of  that  back-end. 

In  this  strategy,  the  descriptor  search  phase  of  the  directory  management  is 
still  performed  at  the  controller.  The  six  remaining  strategies  we  will  con¬ 
sider  are  those  which  try  to  diminish  the  effect  of  having  this  descriptor  search 
phase  performed  solely  at  the  controller.  In  other  words,  the  following  stra- 
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tegies  will  alleviate  the  controller  limitation  problem  further. 

C.  The  Rotating  Strategy 

In  the  rotating  strategy,  the  descriptor  processing  during  the  descrip¬ 
tor  search  phase  is  done  in  a  round-robin  fashion  among  the  controller  and 
the  back-ends.  More  specifically,  the  first  query  is  processed  at  the  con¬ 
troller,  the  second  query  is  processed  at  the  first  back-end,  the  third  query 
is  processed  at  the  second  back-end,  and  so  on.  As  a  result,  it  is  hoped  that 
some  alleviation  of  the  controller  limitation  problem  will  take  place.  As  in 
the  partially  centralized  strategy,  the  address  generation  is  done  individually 
at  each  back-end.  When  the  arrival  rate  of  requests  to  MDBS  is  low, 
this  strategy  will  not  perform  any  better  than  the  partially  centralized  stra¬ 
tegy.  On  the  other  hand,  when  the  arrival  rate  of  request  is  high,  this 
strategy  may  lead  to  an  improvement  in  performance  over  the  partially  centra¬ 
lized  strategy.  This  is  because  the  descriptor  processing  on  a  number  of  queries 
by  multiple  back-ends  may  be  overlapped  while  the  descriptor  processing  on  in¬ 
dividual  queries  by  the  controller  must  be  done  sequentially  in  the  partially 
centralized  strategy. 

D.  The  Rotating  Without  Controller  Strategy 

This  strategy  is  very  similar  to  the  previous  one.  The  only  difference 
is  that  no  descriptor  processing  is  done  at  the  controller.  Thus,  the  descrip¬ 
tor  processing  for  the  first  query  is  done  at  the  first  back-end,  the  descrip¬ 
tor  processing  for  the  second  query  is  done  at  the  second  back-end,  and  so  on. 
After  descriptor  processing  is  completed  at  a  back-end,  the  corresponding  des¬ 
criptors  must  be  broadcast  to  all  back-ends  so  that  they  may  proceed  with  the 
address  generation  phase.  The  only  reason  for  introducing  this  strategy  into 
consideration  is  that  it  would  appear  to  alleviate  the  controller  limitation 
problem  completely  from  directory  management. 

We  note  that  both  the  rotating  with  controller  and  the  rotating  without 
controller  strategies  require  the  duplication  of  the  necessary  tables  (i.e., 

DDIT)  at  all  back-ends.  This  is  tolerated  for  the  following 
reasons.  It  will  be  shown,  later,  that  the  tables  needed  for  address  genera¬ 
tion  are  very  much  larger  than  the  tables  needed  for  descriptor  processing.  As 
a  result,  duplicating  the  tables  needed  for  descriptor  processing  for  multiple 
back-end  is  tolerable.  Furthermore,  we  are  willing  to  sacrifice  storage,  if 
it  means  an  improvement  in  performance. 
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Up  to  this  point,  two  of  the  strategies  allow  queries  to  be  processed 
parallelly  in  the  descriptor  search  phase.  There  is  no  parallel  descriptor 
processing  of  predicates  for  a  given  query.  In  the  following  three  strategies, 
we  explore  the  possibility  of  parallel  descriptor  processing  of  predicates  for 
individual  queries  during  the  descriptor  search  phase. 

E.  The  Fully  Duplicated  Strategy 

In  the  fully  duplicated  strategy,  the  descriptor  processing  is  done  across 
the  back-ends.  More  specifically,  if  there  are  n  back-ends  in  MDBS  and  a 
query  contains  x  predicates,  each  back-end  will  process  x/n  predicates  and 
generate  x/n  corresponding  descriptors  which  will* in  turn,  be  communicated  to 
each  other.  Each  back-end  may  then  proceed  to  the  address  generation  phase. 

Such  a  strategy  also  requires  the  necessary  tables  to  be  duplicated  at  each 
back-end. 

F.  The  Descriptors  Dividing  by  Attribute  Strategy 

In  this  strategy,  we  explore  the  possibility  of  achieving  parallel  des¬ 
criptor  processing  without  the  need  for  any  duplication  of  necessary  tables. 

If  there  are  i  directory  attributes  and  n  back-ends  in  MDBS,  each  back-end  will 
maintain  the  descriptors  corresponding  to  i/n  attributes.  Each  back-end  will 
process  those  predicates  in  a  query  where  the  descriptors  corresponding  to  the 
attribute  in  the  predicate  is  maintained  at  that  back-end.  It  is  expected  that 
each  back-end  will  do  an  equal  amount  of  descriptor  processing,  although  there 
may  be  cases  where  one  back-end  does  more  descripter  processing  than  the  others. 
This  happens  if  all  the  predicates  in  a  query  are  such  that  the  descriptors 
that  need  to  be  searched  are  all  stored  at  the  same  back-end. 

G.  The  Descriptors  Division  Within  Attribute  Strategy 

Like  the  previous  strategy,  this  strategy  also  attempts  parallel  descriptor 
processing  without  any  duplication  of  the  necessary  tables.  If  there  are  i  des¬ 
criptors  on  each  directory  attribute,  each  back-end  will  maintain  for  each  attrib¬ 
ute  i/n  descriptors.  Thus,  descriptor  processing  is  spread  over  the  back-ends. 

All  back-ends  will  participate  in  the  descriptor  processing  of  a  query.  After  each 
back-end  obtains  some  results,  they  exchange  their  results.  Then,  each  back-end 
preceed3  with  its  own  address  generation  phase. 

H.  The  Fully  Replicated  Strategy 

This  is  the  final  strategy  we  consider  for  doing  directory  management.  In 
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this  strategy,  each  back-end  will  work  on  the  entire  query  during  the  descrip¬ 
tor  search  phase.  The  advantage  of  letting  each  back-end  do  the  descriptor 
processing  on  all  queries  is  that,  unlike  the  previous  three  strategies,  ex¬ 
changes  among  back-ends  are  unnecessary  in  this  strategy  because  all  back-ends 
have  all  the  needed  results.  After  completing  the  descriptor  processing,  each 
back-end  does  its  address  generation. 

4.2.3  Performance  Evaluation  of  the  Directory  Management  Strategies 

In  this  section,  we  compare  the  eight  strategies  for  directory  management 
on  the  basis  of  performance.  For  our  convenience,  we  name  these  strategies  A 
through  H,  respectively. 

Two  different  approaches  are  used  for  the  performance  analysis.  The  first 
approach  considers  the  directory  management  process  alone.  It  does  not  consider 
the  fact  that  there  may  be  queues  at  the  various  back-ends  and  that  the  control¬ 
ler  may  allow  overlapping  of  query  handling  for  directory  management.  Specifically, 
the  different  strategies  are  compared  in  terms  of  the  time  duration  from  the  re¬ 
ceipt  of  a  request  to  the  point  where  all  the  necessary  secondary  memory  addresses 
are  generated,  including  the  time  taken  for  messages  exchanges. 

The  second  approach  studies  the  relative  response  time  for  a  typical  re¬ 
quest  when  each  of  these  eight  strategies  is  employed  for  directory  management. 

In  other  words,  the  directory  management  is  considered  in  terms  of  its  effect 
on  the  overall  response  time  of  a  request.  This  study  employs  a  closed  queueing 
network  model  of  MDBS. 

One  issue  that  is  important  to  both  approaches  is  whether  the  necessary 
tables  for  descriptor  processing  and  address  generation  can  be  stored  in  the 
main  memory.  In  our  analysis,  we  assume  that  a  page  of  the  descriptor-to- 
descriptor-id  table  DDIT  is  in  the  main  memory  with  probability  p,  where  p  is 
high.  This  is  because  DDIT  is  small.  For  the  augmented  cluster  definition 
table,  i.e.,  the  augmented  CDT,  on  the  other  hand,  we  assume  that  only  a  cer¬ 
tain  amount  of  the  main  memory,  m,  is  reserved  for  the  table.  Now,  if  the 
augmented  CDT  requires  greater  amount  of  memory,  g,  then  pages  of  the  augmented 

CDT  are  assumed  to  be  in  the  main  memory  with  probability  — .  In  general,  —  <  p. 

8  8 

Another  issue  to  be  resolved  before  we  begin  our  analyses  concerns  the 
searching  of  the  descriptors.  It  is  clearly  possible  to  store  the  descriptors 
in  sorted  order  and  search  for  the  right  descriptor  using  a  binary  search. 

Another  technique  would  be  to  store  the  descriptors  as  a  B-tree,  whose  leaf  nodes 
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are  the  descriptors  themselves.  However,  since  the  number  of  descriptors  is 
not  expected  to  be  very  large,  we  shall  assume  that  a  binary  search  is  used. 

A.  Time  Analyses  and  Performance  Equations 

In  this  section,  we  present  the  results  for  the  execution  time  of  the 
directory  management  algorithms  from  the  point  that  a  query  is  received  to  the 
point  where  the  addresses  of  the  records  are  generated  and  made  available  at 
the  back-ends.  This  time  therefore  includes  the  time  for  descriptor  processing 
and  address  generation. 

The  following  observations  may  be  used  to  simplify  our  calculations. 

First,  the  centralized  strategy  and  the  partially  centralized  strategy  differ 
only  in  the  address  generation  phases.  Clearly,  the  partially  centralized  stra¬ 
tegy  takes  less  time  for  address  generation  and,  hence,  is  superior  to  the  cen¬ 
tralized  strategy.  We  only  need  to  compare  the  remaining  strategies  to  deter¬ 
mine  the  best  one.  All  the  remaining  strategies  use  exactly  the  same  algorithm 
for  address  generation.  Hence,  the  time  for  address  generation  need  not  be 
included  in  our  comparison  study.  With  these  observations,  let  us  proceed  with 
our  comparison  study,  let, 

tm:  time  to  send  a  message  from  (to)  the  controller 

D:  total  number  of  descriptors 

i:  total  number  of  directory  attributes 

td :  time  to  read  a  descriptor  from  the  main  memory 

tp:  time  to  read  an  entry  from  the  DDIT  in  the  main  memory, 

ta:  time  to  read  a  predicate 

x:  number  of  predicates  in  a  query  conjunction 
tpr:  time  for  an  arithmetic  operation 

k:  number  of  descriptors  per  secondary  memory  page 
tpg:  time  to  access  a  secondary  memory  page 

lg:  logarithm  of  the  base  2 
u(z):  the  nearest  integer  greater  than  or  equal  to  z 

tb:  time  taken  for  doing  a  binary  search  on  k  descriptors 

In  the  ensuing  discussions,  let  us  calculate  the  total  time  taken  in  order 
to  complete  the  descriptor  search  phase  and  have  the  corresponding  set  of  des¬ 
criptors  available  at  all  back-ends.  That  is,  let  us  calculate  the  time  taken 
for  directory  processing  and  the  time  for  exchanging  any  messages  that  may  be 
needed.  For  simplicity,  we  assume  that  users  only  employ  single  conjunctions 
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in  their  requests.  If  a  disjunction  of  p  conjunctions  is  used  in  a  request, 
it  may  be  easily  treated  as  p  separate  requests. 

In  Strategies  A,  B,  C,  D  and  H,  it  is  easy  to  see  that  the  descriptor 
processing  on  the  entire  query  conjunction  is  done  at  a  single  computer.  Thus, 
the  average-case  time  is  expressed  as  follows 

x(ta  +  (i/2)(tp  +  tpr)  +  lg((td  +  4tpr)D/i)  +  2tpr)  -  (1) 

Here  is  the  explanation.  For  each  of  the  x  predicates  in  the  conjunction,  the 
following  must  be  done.  First,  the  predicate  must  be  read  and  this  takes  ta 
time  units.  Next,  the  i  entries  in  DDIT  must  be  searched.  On  the  average,  i/2 
entries  will  need  to  be  searched.  Each  of  the  i/2  entries  must  be  read  (taking 
tp  time  units  each  time)  and  compared  with  the  attribute  in  the  predicate 
(taking  tpr  time  units).  Next,  the  set  of  D/1  descriptors  must  be  searched. 

Our  algorithm  for  a  binary  search  of  N  descriptors  is  included  in  ap¬ 
pendix  D.  From  the  algorithm,  the  total  time  to  do  binary  search  on  N  des¬ 
criptor  is 

2tpr  +  lgN(time  to  read  a  descriptor  +  4tpr) 

Thus,  the  time  taken  for  doing  a  binar  '  search  on  D/i  descriptors  will  take 

2tpr  +  lg((td  +  4tpr)D/i) 

Together,  we  obtain  equation  (1). 

In  arriving  at  equation  (1),  we  assumed  that  all  the  descriptors  are  in  the 
main  memory.  Let  us  now  improve  equation  (1)  by  assuming  that  only  a  fraction 
p  of  the  descriptor  pages  are  in  the  main  memory.  We  assume  that  all  the  D 
descriptors  are  stored  in  secondary-memory  pages  each  of  which  contains  up  to 
k  descriptors.  Also,  descriptors  for  different  attributes  are  stored  in  dif¬ 
ferent  pages.  In  searching  descriptors  organized  as  pages,  we  assume  the  fol¬ 
lowing  algorithm  is  used.  Pages  are  retrieved  sequentially.  For  each  page 
retrieved,  the  ranges  of  the  first  and  last  descriptors  in  the  page  are  compared 
to  the  value  in  the  predicate.  This  will  tell  us  if  the  page  contains  the  des¬ 
criptor  we  are  looking  for.  If  so,  a  binary  search  of  that  page  is  performed. 

If  not,  the  next  page  is  retrieved,  and  so  on.  Since  the  first  page  must  be  pro¬ 
cessed  before  the  second  one  is  brought  in  and  because  the  pages  needed  may  not 
be  adjacent  in  the  secondary  memory,  we  assume  no  I/O  overlap  with  CPU  proces¬ 
sing.  Then,  the  time  for  descriptor  processing  is: 

x(ta  +  (i/2) (tp  +  tpr)  +  (u(D/(2ik))  -  1)  ( (1  -  p)tpg  +  2(td  +  tpr)) 

+  ((1  -  p)tpg  +  2(td  +  tpr)  +  tb)) 


(2) 


in  the  average  case.  The  worst-case  time  may  be  obtained  from  equation  (2)  by 
replacing  u(D/(2ik))  with  u(D/(ik))  and  (i/2)(tp  +  tpr)  with  i(tp  +  tpr).  Here, 
the  time  taken  for  doing  a  binary  search  on  k  descriptors,  tb,  is  (2tpr 
+  lg  k  (td  +  4tpr ) ^  . 

Let  tdr  be  the  time  taken  for  descriptor  processing.  Then  tdr  is  equal  to 
above  equation  (2).  Finally,  the  time  taken  for  descriptor  processing  and  mes¬ 
sage  exchanging  in  strategies  A,  B  and  H  is 

(tin  +  tdr) 

where  tm  is  the  time  taken  to  broadcast  the  corresponding  descriptors  from  the 
controller  to  the  back-ends  in  strategies  A  and  B,  tm  is  the  time  to  broadcast 
the  original  query  conjunction  to  all  the  back-ends  each  of  which  individually 
spends  tdr  time  units  to  calculate  the  corresponding  descriptors  in  Strategy  H. 

In  Strategy  C,  the  time  for  descriptor  processing  and  message  exchanging  is 

(n/(n  +  1))  *  (tdr  +  2tm)  +  (l/(n  +  1))  *  (tdr  +  tm) 

This  is  explained  as  follows  for  a  system  with  n  back-ends.  If  the  processing 
is  done  in  the  controller  and  happens  once  every  (n + 1)  times,  the  processing 
must  include  the  time  for  braodcasting  the  corresponding  descriptors  from  the 
controller  to  the  back-ends.  On  the  other  hand,  if  the  processing  is  done  at 
one  of  the  back-ends  as  happens  n  out  of  (n  +  1)  times,  then  the  processing 
time  must  include  the  time  for  sending  the  original  query  conjunction  to  a  back¬ 
end  and  the  time  for  broadcasting  the  corresponding  descriptors  to  all  back¬ 
ends  from  that  back-end . 

Finally,  in  Strategy  D,  the  time  for  descriptor  processing  and  message 
exchanging  is 

(tdr  +  2tm) 

We  now  need  to  calculate  the  descriptor  processing  time  for  strategies  E, 

F  and  G.  For  Strategy  E,  the  time  for  descriptor  processing  in  the  average 
case  is: 

u(x/n)(ta  +  (i/2)(tp  +  tpr)  +  ((l-p)tpg  +  2(td  +  tpr)  +  tb) 

+  (u (D/ (2ik) )  -  l)((l-p)tpg  +  2(td  +  tpr)))  +  2tm -  (3) 

The  time  for  descriptor  processing  in  the  worst  case  may  be  obtained  from  equa¬ 
tion  3  by  replacing  (i/2 )  ( td  +  tpr)  with  i(tp  +  tpr)  and  u(D/(2ik))  with 
u(D/(ik)).  Time  for  two-message  exchanges  is  also  included  in  the  above  equation. 
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One  of  the  messages  is  for  braodcasting  the  original  query  conjunction  from 
the  controller  to  all  the  back-ends.  The  second  one  is  for  exchanging  partial 
results  among  the  back-ends.  We  note  that  this  method  requires  the  presence 
of  all  the  descriptors  at  each  back-end.  Furthermore,  no  more  than  x  back-ends 
can  be  executing  a  query  conjunction  simultaneously,  since  there  are  x  predi¬ 
cates  in  the  query  conjunction.  This  is  the  maximum  degree  of  parallelism  that 
can  be  brought  to  bear  on  the  processing  of  descriptors. 

Let  us  now  present  the  descriptor  processing  and  message  exchange  time  for 
Strategy  F  as  follows. 

u(x/n) (ta  +  u(i/(2n))(tp  +  tpr)  +  ((1-  p)tpg  +  2(td  +  tpr)  +  tb) 

+(u(D/(2ik))  -  1)((1  -  p)tpg  +  2 (td  +  tpr)))  +  2tm -  (A) 

The  worst  case  time  is  obtained  from  equation  by  replacing  u(x/n)  with  x, 
u(i/(2n))  with  u(i/n),  and  D/(2ik)  with  D/(ik).  In  general,  however,  we  expect 
this  strategy  to  perform  at  the  time  closer  to  the  average-case  time  than  the 
worst-case  time.  This  is  because  the  database  administrator  may  use  his  know¬ 
ledge  of  the  typical  query  conjunctions  to  assign  descriptors  to  back-ends  in 
an  appropriate  fashion. 

Finally,  the  average-case  time  for  descriptor  processing  and  message  exchange 
by  Strategy  G  is  as  follows. 

x(ta  +  u(i/2)(tp  +  tpr)  +  ((l-p)tpg  +  2(td  +  tpr)  +  tb) 

+  (u(D/(2nik))  -  1)  ((1  -  p)tpg  +  2(td  +  tpr)))  +  2tm -  (5) 

The  worst-case  time  is  obtained  from  equation  5  by  replacing  u(i/2)  with  i  and 
D/ (2nik)  with  D/(nik).  It  is  clear  that  for  small  values  of  D  and  large  values  of 
k,  increasing  n  is  going  to  have  little  effect  in  reducing  the  time.  For  in¬ 
stance,  consider  that  the  number  of  descriptors,  D,  is  100  and  that  the  number 
of  descriptors  stored  in  a  page,  k,  is  100.  Then  u(D/(nik))  =  1,  irrespective 
of  the  number  of  back-ends,  n.  Since  u(D/(nik))  is  the  only  expression  in  equa¬ 
tion  (5)  which  contains  n,  it  is  clear  that  increasing  n  is  not  going  to  reduce 
the  time  for  descriptor  processing  and  message  exchange.  However,  this  strategy 
should  prove  advantageous  for  large  values  of  D. 

B.  Computations  and  Their  Interpretations  Resulted  from  the  Performance 
Equations 

In  order  to  compare  the  descriptor  processing  and  message  exchange  times 
of  MDBS  under  various  strategies,  we  use  the  following  values  for  the  parameters 
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ln  the  equations  (1)  through  (5)  to  calculate  the  times. 

tm  ■  8msecs  (time  to  send  a  message  from  (to)  the  controller) 

tpr  ■  5psecs  (time  for  an  arithmetic  operation) 

td  *  5ysecs  (time  to  read  a  descriptor  from  the  main  memory) 

ta  ■  3ysecs  (time  to  read  a  predicate) 

tp  «  3usecs  (time  to  read  an  entry  from  the  DDTT  in  the  main  memory) 

tpg  =  47.3msecs  (time  to  access  a  secondary  memory  page) 

k  ■  85  (no.  of  descriptors  per  secondary  memory  page) 

P  *  0.9  (probability  of  a  page  of  DDIT  entries  being  in  the  main  memory) 

x  »  5  (no.  of  predicates  in  a  query  conjunction) 

The  value  of  n,  the  number  of  back-ends,  is  chosen  from  the  set  {2,  5,  8}. 

The  ratio  D/i,  the  number  of  descriptors  per  directory  attribute,  is  chosen 
from  the  set  {10,  20,  30,  40).  The  number  of  directory  attributes  is  chosen 
from  the  set  {5,  10,  15). 

The  results  of  our  calculations  are  shown  in  Figure  16.  In  this  figure, 
we  present  two  rows  of  results  for  each  number  of  back-ends.  The  first  row 
gives  the  average-case  times  and  the  second  row  gives  the  worst-case  times  for 
descriptor  processing  and  message  exchanging.  It  is  seen  that  two  of  the  three 
strategies  that  utilize  parallel  processing  of  predicates  of  a  given  query 
conjunction  during  the  descriptors  search  phase,  namely  strategies  E  and  F, 
give  the  best  average-case  results.  They  consistently  outperform  all  the  other 
strategies  over  the  entire  range  of  variation  which  we  tried  for  the  number  of 
back-ends,  the  number  of  directory  attributes  and  the  number  of  descriptors  per 
attribute.  The  times  under  strategies  E  and  F  may  be  as  much  as  12  msecs  less 
than  the  time  under  the  next  best  strategy.  This  occurs  when  the  number  of 
back-ends  employed  is  eight. 

Looking  at  the  worst-case  results,  we  see  that  Strategy  E  is,  once  again, 
the  best  strategy.  The  worst-case  performance  of  Strategy  F,  however,  is  far 
worse  than  that  of  Strategy  E.  This  is  because  the  worst  case  for  Strategy  F 
occurs  when  all  the  predicates  in  a  query  conjunction  are  such  that  the  des¬ 
criptors  that  need  to  be  searched  for  descriptor  processing  are  all  stored  at 
a  single  back-end.  Thus,  all  the  descriptor  processing  is  performed  at  this 
single  back-end  and  parallel  descriptor  precessing  is  not  achieved.  In  Strat¬ 
egy  E,  on  the  other  hand,  the  duplication  of  descriptors  allows  us  to  achieve 
parallel  descriptor  processing  for  all  types  of  query  conjunctions. 

In  addition,  the  following  observations  may  be  made  from  the  results  of 
Figure  16.  Under  Strategies  E  and  F,  the  performance  of  MDBS  improves  with  an 
increase  in  the  number  of  back-ends.  However,  this  improvement  will  not  go  fur¬ 
ther  if  the  number  of  back-ands  is  greater  than  the  number  of  predicates  in  a  query 
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Figure  16.  Directory-Processing-and-Message-Exchanging  Times 
(in  msecs)  Under  Different  Strategies 
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conjunction.  The  performance  of  MDBS  under  other  strategies,  do  not  improve 
with  increasing  number  of  back-ends.  Thus,  these  other  strategies  are  not 
suitable  for  implementation  in  MDBS. 

The  results  of  Figure  16  do  not  clearly  indicate  whether  performance  of 
MDBS  under  any  strategy  is  independent  of  the  number  of  descriptors  per  direc¬ 
tory  attribute.  To  test  whether  the  performance  of  MDBS  is  independent  of  the 
number  of  descriptors  per  attribute,  we  make  a  second  set  of  calculations  with 
the  number  of  descriptors  per  attribute  varying  between  100  and  200.  The  re¬ 
sults  are  shown  in  Figure  17.  Comparing  corresponding  entries  in  Figure  16  with 
the  ones  in  Figure  17,  we  see  that  the  increasing  number  of  descriptors  per  at¬ 
tribute  affects  the  performance  considerably.  Also,  we  see  that  Strategies  E 
and  F  provide  the  best  average-case  results.  In  fact,  differences  as  large  as 
30  msecs  are  now  possible  between  these  two  strategies  and  the  remaining 
strategies.  Interestingly  enough,  the  performance  of  Strategy  G  is  now  compar¬ 
able  to  that  of  Strategies  E  and  F. 

We  make  a  final  set  of  calculations  with  the  number  of  descriptors  per  at¬ 
tribute  varying  from  700  to  900.  Of  course,  this  may  be  an  unreasonably  large 
range  of  values  for  the  number  of  descriptors  per  attribute.  However,  we  are 
interested  in  the  results  of  this  calculation  from  a  point  of  view  where  MDBS 
is  heavily  loaded  and  utilized.  The  results,  shown  in  Figure  18,  indicate 
that  Strategy  G  is  now  comparable  to,  and  occasionally  even  better  than. 

Strategies  E  and  F.  By  and  large,  these  three  strategies  which  employ  parallel 
processing  of  predicates  by  multiple  back-ends  during  the  descriptor  search  phase 
outperform  the  other  strategies. 

C.  A  Preliminary  Conclusion  Based  on  the  Performance  Equations 

The  results  of  our  study  may  be  summarized  as  follows.  The  three  Strategies  - 
namely.  Strategies  E,  F  and  G,  which  utilize  parallel  processing  of  predicates 
of  a  query  conjunction  during  the  descriptor  search  phase,  may  provide  better 
performance  than  the  other  five  strategies.  Furthermore,  the  employment  of 
any  of  these  strategies  in  MDBS  leads  to  an  improvement  in  performance  with  each 
increase  in  the  number  of  back-ends.  However,  the  extremely  poor  worst-case  per¬ 
formance  of  Strategy  F  would  eliminate  it  from  consideration  for  MDBS  implemen¬ 
tation.  Similarly,  the  poor  average-and-worst  case  performance  of  Strategy  G 
for  typical  values  of  number  of  attribute  and  number  of  descriptors  per  attribute 
would  eliminate  it  from  consideration  for  MDBS  implementation.  Consequently, 

- _J 


Number  of  Attributes  *  5 

Number  of  Descriptors  Per  Attribute  *  100 


Strategy 

Back-''\^ 

ends 

A 

— 

B 

C 

D 

E 

F 

_ 

G 

H 

2 

32.74 

32.74 

30.84 

40.74 

32.74 

56.57 

56.57 

61.90 

45.14 

64.49 

40.82 

56.57 

5 

32.74 

32.74 

20.95 

20.93 

40.74 

32.74 

56.57 

56.57 

63.23 

64.57 

25.71 

64.41 

40.82 

56.57 

8 

32.74 

32.74 

39.85 

40.74 

20.95 

20.93 

40.74 

32.74 

56.57 

56.57 

63.68 

64.57 

25.71 

64.41 

40.82 

56.57 

Number  of  Attributes  =  5 

Number  of  Descriptors  Per  Attribute  =>  200 


Strategy 

Back^^^ 

ends 

A 

B 

’ 

C 

D 

E 

F 

G 

H 

2 

56.49 

56.49 

61.82 

64.49 

45.09 

45.07 

40.74 

56.49 

80.32 

80.32 

85.65 

88.32 

59.39 

88.24 

40.82 

80.32 

5 

56.49 

56.49 

63.15 

64.49 

25.70 

25.68 

40.74 

56.49 

80.32 

80.32 

86.98 

88.32 

30.46 

88.16 

40.82 

80.32 

8 

56.49 

56.49 

63.60 

64.49 

25.70 

25.68 

40.74 

56.49 

_ 

80.32 

80.32 

87.43 

88.32 

30.46 

88.16 

40.82 

80.32 

Number  of  Attributes  *  5 

Number  of  Descriptors  Per  Attribute  ■  300 


^''xStrategy 

Back-\. 

ends 

A 

B 

C 

D 

E 

F 

G 

H 

2 

56.49 

56.49 

61.82 

64.49 

45.09 

45.07 

40.74 

56.49 

104.07 

104.07 

109.40 

112.07 

73.64 

111.99 

40.82 

104.07 

5 

56.49 

56.49 

63.15 

64.49 

25.70 

25.68 

40.74 

56.49 

104.07 

104.07 

110.73 

112.07 

35.21 

111.91 

40.82 

104.07 

8 

56.49 

56.49 

63.60 

64.49 

25.70 

25.68 

40.74 

56.49 

104.07 

104.07 

111.18 

112.07 

35.21 

111.91 

40.82 

104.07 

Note:  There  are  two  rows  corresponding  to  each  value  of  the  number  of  back¬ 
ends.  The  first  row  gives  the  average  case  times  and  the  second  row 
gives  the  worst  case  times. 


Figure  17.  Descriptor  Processing  and  Message  Exchanging  Times 

(in  msecs)  for  Various  Directory  Management  Strategies 
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the  superior  strategy  for  implementation  in  MDBS  is  Strategy  E. 


D.  Limitations  of  Time  Analysis  and  Performance  Equations  in  the 
Evaluation  of  Directory  Management  Strategies 

Our  previous  evaluation  of  various  strategies  on  the  basis  of  performance 
equation  has  a  number  of  limitations.  First,  the  effects  of  the  directory 
management  strategy  on  other  aspects  of  MDBS  are  not  considered.  It  is  pos¬ 
sible  that  some  of  the  strategies  may  create  bottlenecks  at  some  component 
of  MDBS,  thus  resulting  in  very  poor  overall  response  times.  Second,  the  ef¬ 
fects  of  queueing  delays  of  requests  are  neglected.  Third,  the  requirement 
that  Strategies  E  and  F  lead  to  improved  performance  with  an  increase  in  the 
number  of  back-ends  only  when  the  number  of  back-ends  is  no  greater  than  the 
number  of  predicates  in  a  query  conjunction  seems  unreasonable.  We  expect  that 
there  will  be  performance  improvement  even  when  the  number  of  back-ends  far 
exceeds  the  number  of  predicates  in  a  query  conjunction.  For  example,  for 
Strategy  E,  F  or  G,  if  the  number  of  predicates  per  query  conjunction  is  five, 
and  there  are  ten  back-ends,  two  query  conjunctions  can  be  handled  by  MDBS  with 
each  back-end  processing  a  predicate.  An  1!DBS  with  only  five  bach-ends  would 
not  be  able  to  handle  all  the  predicates  concurrently.  Thus,  MDBS  with  five 
back-ends  should  perform  worse  than  MDBS  with  ten  back-ends,  although  the  re¬ 
sults  of  our  previous  study  would  not  indicate  this  observation.  Fourth,  the 
interpretation  of  the  worst-case  performance  of  Strategy  F  may  be  misleading. 

In  reality,  the  performance  tends  to  average  out  and  is  close  to  the  perfor¬ 
mance  of  the  average  case.  However,  looking  at  the  results  of  Figures  16,  17 
and  18,  one  may  be  tempted  to  remove  Strategy  F  from  consideration  for  MDBS 
implementation.  Finally,  the  strong  point  of  Strategies  C  and  D  has  not  been 
brought  out  by  the  study.  Their  advantage  comes  from  the  fact  that  multiple 
queries  may  be  simultaneously  executed  in  the  different  back-ends.  That  is, 
they  benefit  from  inter-query  parallelism  (where  several  query  conjunctions 
may  be  processed  in  parallel  by  the  back-ends)  rather  than  Intra-query  paral¬ 
lelism  (where  only  predicates  of  a  query  conjunction  may  be  processed  in  paral¬ 
lel  by  the  back-ends). 

K.  Performance  Analysis  Based  on  s  Clo.sud  Oueueing  Network  Model 

We  intend  to  compare  the  various  directory  management  strategies  by  using 
a  closed  queueing  network  model.  Such  a  model  overcomes  all  the  limitations 
of  the  previous  study.  First  of  all,  we  are  going  to  model  all  system  activities 


-  11  3- 


in  MDBS  including  parsing  of  requests,  descriptor  processing,  address  genera¬ 
tion,  retrieval  of  records  from  the  secondary  storage,  messages  passing,  and 
record  processing  in  the  back-ends.  Even  the  bus  in  MDBS  will  be  modelled. 
Furthermore,  such  a  model  takes  into  account  the  queueing  delays  at  each  point 
in  the  system  and  also  accounts  for  both  inter-query  and  intra-query  parallel¬ 
isms.  The  model  is  shown  in  Figure  19.  It  consists  of  two  subsystems:  the 
MDBS  subsystem  (which  consists  of  the  controller,  the  back-ends,  the  disk 
drives  and  the  bus)  and  the  terminal  subsystem.  Each  terminal  is  manned  by  a 
user  who  alternates  between  thinking  and  waiting.  In  the  thinking  state,  the 
user  is  contemplating  what  request  to  submit  next.  On  submitting  a  new  request 
to  MDBS,  the  user  enters  the  waiting  state,  where  he  will  remain  until  MDBS 
completes  the  request  execution.  The  mean  time  that  a  user  spends  in  a  thinking 
state  is  called  the  think  time  and  is  denoted  Z.  The  mean  time  that 
a  user  spends  in  a  waiting  state  is  the  response  time  of  MDBS  and  is  denoted  by 
R.  Thus,  the  mean  time  a  user  spends  at  a  terminal  is  (Z+R),  since  the  user 
is  either  thinking  or  waiting  at  the  terminal.  Furthermore,  R/(Z+R)  represents 
the  portion  of  the  time  that  a  user  is  expected  to  be  in  the  waiting  state. 

Since  there  are  M  users,  the  number  of  users  who  are  expected  to  be  in  the  wait¬ 
ing  state  is  therefore  MR/ (Z+R). 

We  note  that  the  time  spent  in  the  waiting  state  is  equivalent  to  the 
response  time  required  of  MDBS.  The  number  of  users  in  the  waiting  state  is 
therefore  equivalent  to  the  number  of  responses  (i.e.,  requests  to  be  processed 
in  MDBS).  From  now  on,  we  say  the  number  of  user  requests  in  MDBS  whenever  we 
mean  the  number  of  users  who  are  in  the  waiting  state.  Consequently,  we 
consider  the  MDBS  subsystem  alone  as  a  closed  queueing  network  model  where  the 
number  of  requests  in  the  subsystem  is  MR/(Z+R). 

Let  us  now  describe  the  closed  queueing  network  model  of  MDBS  that  is  used 
in  our  study.  The  va rlous  components  in  a  closed  queueing  network  model  are 
usually  referred  to  as  devices.  Our  model  of  MDBS  has  2(n+l)  devices.  These 
are  the  controller,  the  bus,  the  n  disk  systems  and  the  n  back-ends. 

A  separate  I/O  submodel  is  used  for  the  disk  system.  This  I/O  submodel 
is  an  integral  part  of  the  overall  model  and  will  be  discussed  in  great  detail 
in  the  following  section.  The  I/O  submodel  will  be  used  to  calculate  the  re¬ 
sponse  time  of  the  disk  system  to  an  I/O  request  for  a  track  of  data. 
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Arrows  show  path  taken  by  a  typical  request  during  its  execution. 


Figure  19.  Closed  Queueing  Network  Model  of  MDBS 
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E.l  The  I/O  Submodel  for  Single  Requests 

We  consider  the  disk  system  as  consisting  of  M  disk  drives  and  a  single 
channel.  The  I/O  submodel  consists  of  M  drive  queues  and  one  channel  queue  as 
shown  in  Figure  20.  It  is  assumed  that  requests  are  independent  and  arrive  ran¬ 
domly  in  time  at  the  drive  queues,  each  disk  drive  being  likely  addressed 
equally.  The  inter-arrival  distribution  is  chosen  as  exponential  with  L  as 
the  mean  rate  at  which  requests  are  received  by  the  disk  system.  The  inter¬ 
arrival  distribution  of  requests  to  each  drive  queue  is  also  exponential  with 
a  mean  request  rate  of  L/M.  The  assumption  of  exponential  distribution  is 
made  because  it  is  the  most  reasonable  one  in  the  absence  of  other  informa¬ 
tion  about  the  inter -arrival  distribution. 

For  a  queue,  there  is  a  certain  rate  at  which  requests  will  arrive  to  the 
queue  and  a  certain  rate  at  which  requests  in  the  queue  are  serviced.  It  is 
well-known  that  a  queue  may  be  completely  analyzed  if  we  know  the  mean  and 
variance  of  the  inter-arrival  time  of  requests  to  the  queue  and  the  mean  and 
variance  of  the  service  time  of  requests  in  the  queue.  Let  us  now  analyze  the 
drive  queues  and  the  channel  queue,  in  turn. 

Analyzing  a  Drive  Queue  -  The  drive  queue  service  time  is  the  seek  time.  The 
seek  time  of  a  disk  drive  is  approximated  by  an  equation  of  the  form  (a  +  by), 
where  a  and  b  are  constants  and  y  is  the  number  of  cylinders  traversed  dur¬ 
ing  the  seek.  We  let  N  be  the  total  number  of  cylinders  in  a  disk  drive.  Then, 


Mean  service  time  =  Mean  seek  time 

"  ,2 


y-1 


a  bN b_ 

3 "  N  3  3N 


Variance  of  service  time  =  Variance  of  seek  time 


2m2 


bzN 


a 

w 


18  18 

(See  Appendix  E  for  the  derivation) 


b2 

9N? 


IN?  +  T  + 


2ab 

3 


For  a  drive  queue,  these  are  the  only  two  quantities  we  will  need  in  our  deri¬ 
vation. 

Analyzing  the  Channel  Queue  -  For  the  channel  queue,  the  arrival  distribution 
Is  exponential  and  the  service  distribution  is  constant.  Then 
Mean  service  time  *  disk  rotation  time 
-  d  . 

Variance  of  service  time  *  0. 

Mean  Inter-arrival  time  *  ^  (See  Figure  20) . 

Variance  of  Inter-arrival  time  *  i  • 

la 


J 


mean  rate  of  request 
arrivals 


Figure  20.  Queueing  Model  of  a  Single  Channel  Disk  System 
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The  four  quantities  needed  to  analyze  the  channel  queue  are,  thus,  com¬ 
pletely  specified.  Since  the  inter-arrival  distribution  for  the  channel  queue 
is  exponential  and  the  service  distribution  for  the  channel  queue  is  non¬ 
exponential,  the  channel  queue  is  now  analyzed  as  an  M/G/l  queue  [Klei75]. 
Application  of  a  standard  queueing  theory  equation  gives  us 


Channel  waic  time 
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Let  the  time  spent  by  a  request  in  the  channel  queue  and  the  time  spent  by  a 
request  in  getting  service  after  its  removal  from  the  channel  queue  be  refer¬ 
red  to  as  the  time  spent  by  a  request  in  the  channel.  Then 
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Variance  of  time  in  channel 
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(See  Appendix  E  for  the  derivation). 

This  completes  our  analysis  of  the  channel  queue. 

Analyzing  the  Composite  Queue  -  We  shall  consider  a  drive  queue  and  channel 
queue  as  a  composite  queue.  This  is  because  after  a  disk  drive  completes  a 
seek,  it  must  acquire  the  channel  for  transfering  the  track  of  data  over  the 
channel.  Only  then  may  a  disk  drive  begin  another  seek. 


Mean  service  time  of  Composite  Queue 
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Variance  of  service  time  of  Composite  Queue 
a  v 

=  Variance  of  seek  time  +  Variance  of  time  in  channel 
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Variance  of  Inter-arrival  time  to  composite  queue 
M2 


Thus,  all  the  four  quantities  needed  to  specify  the  composite  queue  are  known. 
Since  the  inter-arrival  distribution  of  the  queue  is  exponential  and  the  ser- 
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vice  distribution  is  non-exponential,  the  composite  queue  is  an  M/G/l  queue. 
Then,  using  a  standard  queueing  theory  equation,  the  total  time  spent  by  a  re¬ 
quest  in  the  composite  queue  which  is  the  total  time  spent  by  a  request  in  the 
disk  system  and,  hence. the  response  time  for  a  request  is  r  and 
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A  Note  on  this  model  vs .  other  models  -  The  I/O  submodel  that  we  have  developed 
thus  far  is  simpler  than  other  models  of  disk  systems  in  that  we  are  able  to 
obtain  a  simple  closed-form  solution  for  the  response  time  without  relying 
on  iteration  and  transformation.  The  results  of  [Bard81]  require  the  solu¬ 
tion  of  a  set  of  simultaneous  equations  by  the  Newton-Raphson  method.  The 
method  of  [Gotl73]  for  multiple  disk  drives  also  requires  an  iteration  of  equa¬ 
tions  to  be  performed.  The  use  of  a  machine-repair  queueing  model  [Saat61] 
for  the  analysis  of  the  channel  queue  also  yields  no  closed  form  solution  for 
the  response  time.  Finally,  the  results  of  [Fran74]  require  Laplace  transfor¬ 
mations. 


E.2  The  I/O  Submodel  for  Bulk  Requests  with  Fixed  Bulk  Size 

We  now  proceed  to  make  a  very  important  extension  to  the  I/O  submodel. 

In  a  database  management  system  environment  such  as  MDBS  where  requests  are 
issued  in  a  high-level  query  language,  each  request  may  involve  several  tracks. 
Thus,  each  request  to  the  disk  system  requires  actually  the  retrieval  of  sev¬ 
eral  tracks.  In  the  sequel,  we  assume  that  requests  to  the  disk  system  arrive 
with  an  exponential  inter-arrival  distribution  at  a  mean  rate  L.  However,  each 
request  is  assumed  to  require  the  retrieval  of  T  tracks  of  data  which  are  ran¬ 
domly  distributed  on  the  disk  tracks.  This  is  a  bulk  arrival  system  in  the 
queueing  theory  terminology.  We  term  T  the  size  of  ji  bulk  request.  Sometimes, 
we  may  also  talk  of  the  T  subrequests  of  a  bulk  request. 

The  queueing  model  we  use  is  shown  in  Figure  21.  As  in  the  previous  sec¬ 
tion  for  single  requests,  we  will  consider  a  drive  queue  and  the  channel  queue 
as  a  composite  queue.  The  composite  queue  is  a  queue  with  bulk  arrivals.  In 
order  to  analyze  a  queue  with  bulk  arrivals,  we  need  to  know  the  mean  and  var¬ 
iance  of  the  service  time,  the  mean  and  variance  of  the  inter-arrival  time, 
and  the  first  and  second  moments  of  the  size  of  a  bulk  request. 

The  mean  and  variance  of  the  service  time  of  the  composite  queue,  s  and  v. 
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Figure  21.  Queueing  Model  of  a  Disk  System  with  Bulk 
Arrivals  at  Size  T  at  a  Rate  L 
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are  calculated  In  the  previous  section.  The  only  difference  is  that  L  in  the 
expressions  for  s  and  v  must  now  be  replaced  by  TL. 

The  mean  and  variance  of  the  inter-arrival  time  the  composite  queue  are 
i  and  respectively. 

L  L  .p 

The  first  moment  of  the  size  of  a  bulk  request  is  The  second  moment 
of  the  size  of  a  bulk  request  at  each  module  is  calculated  by  assuming  a 
Bernoulli  trial  as  follows.  There  are  T  subrequests  in  a  bulk  request  received 
by  the  disk  system.  For  each  of  these  T  subrequests,  there  is  a  1/M  probabi¬ 
lity  that  it  will  be  assigned  to  a  particular  drive.  Then,  the  probability 
that  i  requests  out  of  T  will  be  assigned  to  the  same  module  is: 

(I)  (i)‘«-i)T-‘ 

Thus,  the  second  moment  of  the  bulk  size  of  a  request  at  a  module  is 

j*2  ( D 

This  is  simplified  to  (T/M)(l-1/M  +  T/M). 

All  the  six  quantities  needed  to  analyze  the  composite  queue  are  now  known. 
The  total  response  time  may  now  be  calculated  using  the  following  formula  from 
[Saat61] . 

r  *  2a^y  <?•  +  i?>  +  8 

where 

q  =  L*s*r ' , 
r '  -  T/M, 

t  -  (T/M)  (1-  1/M  +  T/M). 

Replacing  T  with  1,  we  get  the  same  result  that  we  got  for  the  previous  analy¬ 
sis  when  bulk  requests  were  not  assumed.  Hence,  we  have  reasons  to  believe 
that  our  analysis  is  correct. 

E.3  The  I/O  Submodel  for  Bulk  Requests  with  Variable  Bulk  Size 

We  now  make  one  final  variation  in  our  assumptions  for  modelling  the  I/O 
system  of  MDBS.  In  the  above  analysis,  we  assumed  that  the  size  of  a  bulk 
request,  as  issued  to  the  disks  of  a  back-end,  was  fixed  at  T.  More  generally, 
a  request  to  MDBS  will  require  the  retrieval  of  T  tracks.  However,  the  number 
of  T  tracks  that  will  be  retrieved  at  any  one  back-end  varies  from  1  to  T. 
Therefore,  we  need  to  assume  that  each  disk  system  (one  disk  system  is  asso- 
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clated  with  each  back-end)  may  receive  a  variable  number  of  subrequests  with 
each  bulk  request.  That  is,  the  size  of  the  bulk  requested  will  vary  from  1 
to  T.  If  there  are  n  back-ends,  then  the  mean  size  of  the  bulk  requested  at 
each  disk  system  is  T/n  and  the  variance  of  this  bulk  size  is  (T/n) (1  -  1/n) . 
This  explains  the  queueing  model  shown  in  Figure  22.  The  mean  and  variance 
of  the  bulk  size  of  each  request  as  received  by  each  module  must  now  be  cal¬ 
culated.  This  may  be  calculated  as  follows. 

The  probability  that  there  are  j  subrequests  of  the  bulk  request  at  a 
drive  is  equal  to  the  probability  that  i  out  of  T  requests  are  first  chosen 
to  be  sent  to  one  of  the  back-ends  multiplied  by  the  probability  that  j  out 
of  these  i  requests  are  sent  to  a  particular  drive.  Then,  the  mean  bulk  size 
at  a  drive  is: 


i-j 


_T_ 
Mn  ‘ 


Similarly,  the  second  moment  of  bulk  size  at  a  drive  is: 


-j 


_  i  ,,  X  X  ,  X- 
Mn  (1 ~  Mn^  +  M^n? 

The  final  result  we  are  after  may  now  be  obtained  by  starting  the  equation 


developed  for  the  bulk  request  with  fixed  bulk  size.  We  let  r'  =  — , 


_T_n  +  T2 

Mn  U  Mn'  M2n2 


and  L  in  the  expressions  for  s  and  v  be  replaced  by 


LT 


is  the  result  we  shall  use  for  the  response  time  of  the  disk  system.  Thus, 


2a-'q)  (p- +  + 


where 

q  =  Lsr' 

.  T 
Mn  ’ 


JL  n-X)  + 

Mn  v  Mn'  M2n2  * 


This 


F.  Modelling  the  Eight  Strategies  for  Evaluation 

Having  described  the  I/O  submodel  thoroughly,  we  are  now  ready  to  describe 
the  overall  closed  queueing  network  model  of  which  the  I/O  submodel  is  a  part. 
We  will  develop  a  separate  closed  queueing  network  model  for  each  of  the  eight 


Mean  bulk  size  =  — 
n 

T  .  1 

variance  of  bulk  size  =  —  (1  - 
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Figure  22.  Queueing  Model  of  a  Disk  System  with  Bulk 
Arrivals  of  Variable  Bulk  Size 
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strategies. 

nology . 

tparse 

tdir 

tdire 

tdirf 

tidrg 

tm 

adgen 

adgenl 

T 

y 

thus 

tbus2 

tout 

tproc 


We  are 
strategies . 


For  ease  in  describing  these  models,  we  use  the  following  termi- 
time  to  parse  a  user  request. 

time  to  do  descriptor  processing  in  Strategies  A,  B,  C,  D  and  H. 

time  to  do  descriptor  processing  in  Strategy  E. 

time  to  do  descriptor  processing  in  Strategy  F. 

time  to  do  descriptor  provessing  in  Strategy  G. 

time  to  generate  a  message. 

time  for  address  generation  in  the  controller. 

time  for  address  generation  in  the  back-ends. 

number  of  tracks  to  be  retrieved  for  a  typical  request. 

average  number  of  predicates  in  a  user  query. 

time  to  send  a  user  request  over  the  bus. 

time  to  send  retrieved  records  over  bus. 

time  taken  by  controller  to  output  retrieved  records  to  user, 

time  taken  by  a  back-end  to  check  a  track  of  records  against 

a  user  query. 

now  ready  to  describe  the  eight  models  for  the  eight  different 


F.l  The  Centralized  Model 

Consider  the  sequence  of  execution  of  a  typical  request.  The  request  is 
first  processed  at  the  controller.  The  controller  must  parse  the  request  and 
this  will  take  tparse  time  units.  Next,  descriptor  processing  must  be  per¬ 
formed  on  the  query  in  the  request.  We  have  already  shown  how  the  time  for 
descriptor  processing  (tdir)  may  be  calculated  for  the  various  strategies. 

Next,  the  controller  must  generate  the  necessary  secondary  memory  addresses 
using  the  augmented  CDT.  This  will  take  adgen  time  units.  Finally,  the  con¬ 
troller  must  broadcast  the  addresses  to  the  back-ends  which  will  take  tm  time 
units.  Thus,  the  total  time  taken  to  service  the  request  at  the  controller  is 
(tm  +  tdir  +  adgen  +  tparse).  Since  eacn  back-end  must  receive  a  copy  of  the 
request,  the  controller  effectively  serves  n  requests,  where  the  service  time 
of  a  request  is  (tm  +  tdir  +  tparse  +  adgen)/n. 

The  request  is  now  sent  over  the  bus  to  the  back-ends.  The  bus  service 

time  for  request  processing  is  thus.  Since  n  copies  of  the  same  request  are 

sent  to  n  respective  back-ends,  the  bus  effectively  serves  n  requests.  Thus,  the 

thus 

service  time  is  really  •  The  request  is  now  received  by  the  back-ends 

which  use  the  respective  disK  systems  to  retrieve  relevant  tracks.  If  the 
average  number  of  tracks  to  be  retrieved  for  a  request  is  T,  the  average  num¬ 
ber  of  tracks  that  has  to  be  retrieved  at  each  back-end  is  T/n.  The  service 
time  of  a  disk  system  responding  to  these  T/n  1/0  requests  is  obtained  from 


the  I/O  submodel  described  in  the  previous  section. 

The  retrieved  tracks  are  now  being  checked  against  the  user's  query. 

It  is  assumed  that  tproc  msecs  have  to  be  spent  by  a  back-end  in  processing 

a  track  of  records  against  a  user's  query.  The  records  satisfying  the  query 

are  now  sent  over  the  bus  to  the  controller.  The  service  time  of  the  bus  in 

this  case  is  assumed  to  be  tbus2  (to  be  ^.^>u  when  there  are  n  requests)  .  Fi- 

n 

nally,  the  controller  must  output  the  records  to  the  user  which  takes  tout 
time  units.  A  good  approximation  of  tout  is  tm,  the  time  to  send  a  message. 

As  before,  it  is  convenient  to  assume  that  actually  n  separate  results  are  sent 
to  the  user  with  an  average  service  time  of  — . 

In  some  cases,  some  of  the  devices  (i.e.,  components  of  MDBS)  have  more 
than  one  queue.  For  instance,  the  controller  has  two  queues.  The  first  queue 
contains  requests  that  must  be  parsed,  processed,  and  so  on.  The  second  queue 
contains  a  group  of  records  waiting  to  be  output  to  the  user.  Such  cases  are 

handled  as  separate  classes  in  our  model.  Thus,  there  are  two  classes  of  re¬ 

quests  at  the  controller.  Let  S(a,b)  be  the  service  time  f  >r  the  class  _b  request 
at  device  a.  Similarly,  let  V(a,b)  be  the  visit  ratio  of  the  i :ass  b  request  at 
device  £  which  is  defined  as  the  number  of  service  completions  of  class  b  re¬ 
quests  at  device  a  for  each  service  completion  from  MDBS.  A  closed  queueing 
network  is  completely  specified  when  V(a,b)  and  S(a,b)  are  specified  for  all 
classes  and  for  all  devices.  Then,  the  results  of  [Rood79]  may  be  used  to 
calculate  the  response  times,  average  queue  lengths  and  utilizations  on  a  per 
class  and  per  device  basis.  Let  us  specify  V(a,b)  and  S(a,b)  for  our  model. 

Let  the  controller  be  device  1,  the  bus  be  device  2,  a  disk  system  be  device  3 
and  a  back-ervd  be  device  A. 

There  are  two  classes  of  requests  at  the  controller  (device  1),  The  queue 
of  requests  waiting  to  be  parsed,  processed,  and  so  on,  is  designated  as  class  1 

at  device  1.  The  queue  of  records  waiting  to  be  output  by  the  controller  to 

the  user  is  designated  as  class  2  at  device  1.  The  bus  (device  2)  also  has 
two  classes  of  requests.  The  queue  of  messages  from  the  controller  to  the 
back-ends  is  designated  as  class  1  at  device  2.  The  queue  of  records  sent  from 
the  back-ends  to  the  controller  to  be  output  to  the  user  is  designated  as 
class  2  at  device  2.  The  disk  system  (device  3)  has  only  one  request  class. 

This  is  the  queue  of  I/O  requests  for  retrieval  of  tracks  of  records  at  speci¬ 
fied  addresses  which  is  designated  as  class  1  at  device  3.  Finally,  each  back¬ 
end  (device  A)  has  only  one  request  class  and  this  is  the  queue  of  records  re- 
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n 

tbus 

n 

tbus2 


trieved  by  the  disk  system  which  are  waiting  to  be  checked  against  the  user's 
query.  This  queue  is  designated  as  class  1  at  device  4.  In  some  of  the  later 
models,  we  will  see  that  a  back-end  may  have  two  more  request  classes,  making 
a  total  of  up  to  three  request  classes  at  a  back-end.  If  a  model  has  two  or 
more  request  classes  at  a  back-end,  the  following  notation  is  employed.  The 
queue  of  requests  waiting  for  address  generation  is  designated  as  class  1  at 
device  4.  The  queue  of  records  retrieved  by  the  disk  system  which  are  waiting 
to  be  checked  against  the  user's  query  is  designated  as  class  2  at  device  4. 
Finally,  the  queue  of  requests  waiting  for  descriptor  processing  is  designated 
as  class  3  at  device  4.  This  completes  our  discussion  of  the  notation  which 
will  be  followed  for  this  model  and  for  the  models  for  all  the  other  strategies 
also.  The  S  and  V  matrices  for  the  centralized  model  is  as  below. 

S(l,l)  =  (tdir  +  tm  +  tparse  +  adgen)/n 

S (1, 2)  =  t0ut 

S(2,l)  = 

s(2,2)  = 

S(3,l)  =  ? 

S(4,l)  =  tproc 

V(l,l)  =  n 

V(1 , 2)  =  n 

V(2,l)  =  n 

V(2 , 2)  =  n 

V(3,l)  =  J 

V(4,l)  -  I 

Note  that  we  have  put  a  question  mark  for  S(3,l),  the  service  time  of  the  disk 
system.  The  T/0  submodel  will  be  used  to  calculate  S(3,l). 

F.2  The  Partially  Centralized  Model 

The  notation  we  developed  for  the  previous  strategy  will  now  be  used  to 
explain  the  remaining  models.  Let  us  consider  the  sequence  of  execution  of  a 
particular  request  in  the  partially  centralized  strategy.  First,  the  controller 
parses  the  request  and  then  performs  descriptor  processing  on  the  request. 
Finally,  it  broadcasts  the  corresponding  descriptors  to  all  the  back-ends.  Thus, 
the  service  time  for  the  request  at  the  controller  is  (tdir  +  tparse  +  tm) .  As 
before,  we  assume  that  there  are  n  requests  and  that  the  service  time  for  a  re¬ 
quest  't  tiie  controller  is  — -^-r-  -  ><irs— -  1 .  The  request  now  goes  over  the 

bus  with  a  service  time  of  ' U"  to  the  back-ends. 
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The  back-ends  perform  address  generation  taking  adgenl  units  of  time. 

We  use  adgenl  rather  than  adgen  as  in  the  previous  strategy  because  the  address 
generation  in  this  strategy  and  in  all  the  remaining  strategies  is  different 
from  the  address  generation  in  the  centralized  strategy. 

Next,  the  request  is  presented  by  a  back-end  to  its  disk  system  for  access¬ 
ing  the  relevant  tracks.  The  retrieved  data  are  then  processed  in  the  back-ends 
taking  a  service  time  of  tproc  units.  The  results  are  then  sent  back  over  the 

bus  to  the  controller.  The  bus  service  time  is  again  tbus2.  Finally,  the  re- 

n  tout 

suits  must  be  returned  to  the  user  and  the  controller  takes  -  time  units  for 

n 

this.  The  model  may  be  specified  by  specifying  its  S  and  V  entries  as  below. 


S(l,l) 


(tdir  +  tparse  +  tm) 
n 


S (1, 2) 
S(2,l) 

S(2,2) 

S(3,l) 

S(4,l) 

S(4,2) 


tout 

n 

tbus 

s  ■ . 

n 

_  tbus2 
n 

=  ? 

*  adgenl 
=  tproc 


V(l,l)  =  n 
V(l,2)  =  n 
V(2,l)  =  n 
V(2,2)  =  n 

V(3,l)  =  J 

V(4,l)  =  1 

V(4 ,2)  =  - 


F.3  The  Rotating  Model 

Consider  the  sequence  of  execution  of  a  typical  request.  Two  possibili¬ 
ties  exist  depending  on  whether  the  descriptor  processing  is  done  at  the  con¬ 
troller  or  at  one  of  the  back-ends.  If  the  descriptor  processing  is  done  at 
the  controller,  then  the  service  time  at  the  controller  will  be 

(tparse  +  t.  b  tdir)..  mllMwl  the  service  time  will  be  (tpar.e  a  t»)  _ 
n  n 

Since  there  are  two  possibilities  of  request  execution  in  this  model  as  com¬ 
pared  to  the  single  request  execution  possibility  in  the  other  models,  an 
additional  request  class  is  Introduced  at  the  controller  for  this  model  and 
this  Is  designated  class  3  at  device  1.  The  request  is  now  broadcast  over  the 
bus  with  a  service  time  of  .  If  descriptor  processing  has  not  already  been 
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done  at  the  controller,  then  it  must  be  done  at  the  back-ends,  and  this  takes 
(tdir  +  tm)  units.  The  additional  tm  units  is  needed  to  communicate  the  re¬ 
sults  to  all  the  back-ends.  Next,  the  back-ends  perform  address  generation 
taking  adgenl  time  units.  The  disk  systems  are  activated  at  this  time  to  ac¬ 
cess  tracks.  The  retrieved  data  are  then  processed  by  the  back-ends  with  a 

service  time  of  tproc.  Records  satisfying  the  request  are  sent  over  the  bus 

tbu<?2  tout 

taking  _ —  time  units  and  finally,  the  controller  spends  -  time  units 

n  u 

outputting  the  results  to  the  user.  The  model  may  be  specified  by  the  fol¬ 
lowing  service  times  and  visit  ratios. 


S(l,l) 

S(l,2) 

S(l,3) 

S(2,l) 

S(2,2) 

S(3,l) 

S(4,l) 

S(4,2) 

S(4,3) 


(tdir  +  tparse  +  tm) 
n 

tout 

n 

(tparse  4-  tm) 
n 

thus 

n 

tbus2 

n 

1 

adgenl 

tproc 

(tdir  +  tm) 


V(l-l)  =  ^ 

V  (1 , 2)  =  n 

V^*3> 

V  (2,1)  =  n 
V(2,2)  =  n 

V(3,l)  =  J 

V(4,l)  =  1 

V (4,2)  =  I 

V(4,3)  =  ~ 


F.4  The  Rotating  Without  Controller  Model 

Consider  the  execution  sequence  of  a  typical  request.  The  controller  does 
parsing  on  a  request  before  broadcasting  it  to  all  the  back-ends.  The  control¬ 
ler  service  time  is  —  P— r--e  +-  t-Tn^ .  The  request  is  now  broadcast  over  the  bus 

with  a  service  time  of  - -  and  arrives  at  the  back-ends.  One  of  the  back-ends 

n 

will  do  the  descriptor  processing  for  this  request.  The  visit  ratio  of  the 
descriptor  processing  queue  at  each  back-end  must  be  adjusted  to  reflect  the 
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fact  that  only  one  out  of  every  n  user  generated  requests  will  need  descriptor 

processing  at  that  back-end.  The  back-end  that  does  descriptor  processing 

must  then  communicate  the  results  to  all  the  back-ends.  Hence,  the  service 

time  of  this  queue  in  the  back-end  is  (tdir  +  tm).  Then,  all  the  back-ends 

will  perform  address  generation  independently,  taking  adgenl  time  units.  Next, 

the  disk  system  at  a  back-end  will  retrieve  —  tracks  of  data.  These  tracks  of 

data  are  checked  by  the  back-ends  against  the  user’s  query  taking  tproc  units  of 

,  tbus2 

time  per  track.  The  results  are  shipped  over  the  bus  taking  — ^ —  time  units. 
Finally,  the  controller  outputs  the  results  to  the  user  taking  time  units. 

The  model  is  entirely  specified  by  specifying  the  service  times  and  visit  ratios 


as  below. 


S(l,l) 

S(l,2) 

S(2,l) 

S(2,2) 

S(3,l) 

S(4,l) 

S(4,2) 

S(4,3) 


(tparse  +  tm) 
n 

tout 

n 

thus 

n 

tbus2 

n 

? 

adgenl 
tproc 
tdir  +  tm 


V(l,l)  =  n 
V(l,2)  =  n 
V(2,l)  =  n 
V(2,2)  =  n 

V(3,l)  -  l 
V(4,l)  =  1 
V(4,2)  =  I 

V(4,3)  =  i 


F.5  The  Fully  Duplicated  Model 

This  model  differs  from  the  previous  one  only  in  the  way  descriptor  pro¬ 
cessing  is  done.  Instead  of  each  back-end  doing  the  descriptor  processing  for 
one  out  of  every  n  requests,  each  back-end  will  do  a  portion  of  the  descriptor 
processing  for  every  request.  The  back-ends  must  then  exchange  their  results. 
The  service  times  and  visit  ratios  are  shown  below  for  this  model: 


S(l,l) 

S(l,2) 


(tparse  +  tm) 
n 

tout 

n 


<?l?  * 
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S(2,l) 

S(2,2) 

S(3,l) 

S(4,l) 

S(4,2) 

S(4,3) 


thus 

n 

thus  2 
n 

? 

adgenl 

tproc 

y*tdire  , 
j. - ■+■  tm 

n 


V(l,l)  =  n 
V(l,2)  =  n 
V(2,l)  =  n 
V(2,2)  =  n 

V(3,l)  =  J 

V(4,l)  =  1 

V(4,2)  =  J 

V(4,3)  =  1 


F.6  The  Descriptors  Dividing  by  Attribute  Model 

The  only  difference  between  this  and  the  previous  model  is  that  tdire  is  re¬ 
placed  by  tdirf  in  the  expression  for  S(4,3).  The  value  tdirf  has  been  cal¬ 
culated  in  a  previous  section  and  is  the  time  for  doing  descriptor  processing 
on  a  single  predicate  by  using  Strategy  F.  Each  back-end  is  expected  to 
handle  ^  out  of  the  y  predicates  in  a  user  query. 

F.7  The  Descriptors  Division  Within  Attribute  Model 

Once  again,  this  model  is  very  similar  to  that  for  Strategy  E  and  may  be 
obtained  from  there  by  setting  S(4,3)  equal  to  (y*tdirg  +  tm) .  Once  again, 
tdirg  is  the  time  for  descriptor  processing  on  a  single  predicate  using  Stra¬ 
tegy  G  and  has  been  calculated  in  an  earlier  section.  Note  that,  unlike  the 
previous  strategies,  all  y  of  the  predicates  in  a  query  are  handled  at  each 
back-end . 

F.8  The  Fully  Replicated  Model 

Consider  the  sequence  of  execution  of  a  typical  request.  The  controller 
first  parses  the  request  and  broadcasts  it  to  all  the  back-ends.  Thus,  the 
controller  service  time  is  .  The  requests  go  over  the  bus  with 

an  average  service  time  of  The  requests  are  now  received  at  each  back¬ 

end.  Each  back-end  will  first  perform  descriptor  processing  and  then  address 

T 

generation  on  a  request.  Next,  —  track  retrieve  requests  are  submitted  to  the 
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dlsk  system.  The  retrieved  records  are  searched  against  the  user's  query  in 
the  controller  and  this  takes  tproc  time  units  for  each  track  of  records.  The 
qualified  records  are  now  returned  via  the  bus  taking  a  service  time  of 
tb^s2  and  are  then  output  to  the  user  by  the  controller  taking  a  service  time 
of  t<^t .  The  complete  specification  of  the  service  times  and  visit  ratio  is 
shown  below: 


S(l,l) 

S(l,2) 

S(2,l) 

S(2,2) 

S(3,l) 

S(4,l) 

S(4,2) 


(tparse  +  tm) 
n 

tout 

n 

tbus 

n 

tbus2 

n 

? 

tdir  +  adgenl 
tproc 


V(l,l)  =  n 
V(l, 2)  =  n 
V(2,l)  =  n 
V(2,2)  =  n 

V(3,l)  =  | 
V(4,l)  -  1 
V(4,2)  =  J 


This  completes  our  specification  of  the  variot  s  models  to  be  employed. 
The  only  question  to  be  answered  concerns  the  incorporation  of  the  I/O  sub¬ 
model  into  the  overall  models.  We  shall  explain  the  procedure  employed  in 
order  to  incorporate  the  I/O  submodel  into  the  overall  models  in  the  next 
section. 


G.  Results  of  the  Queueing  Network  Modeling  of  Strategies 

In  order  to  solve  any  of  the  eight  closed  queueing  network  models  of  MDBS, 
the  service  times  and  visit  ratios  of  all  the  devices  in  the  network  must  be 
specified.  We  have  specified  the  visit  ratios  of  all  the  devices  including  the 
disk  systems.  However,  the  service  times  of  the  disk  systems  have  not  yet 
been  specified.  In  this  section,  we  will  describe  how  the  service  time  of  a 
disk  system  may  be  calculated. 

In  Section  K.l,  we  have  an  expression  for  the  mean  service  time  of  the  com¬ 
posite  queue,  in  a  disk  system,  in  terms  of  L,  the  arrival  rate  of  requests 
to  the  disk  system.  However,  we  did  not  know  then  the  value  of  L.  Therefore, 
a  technique  for  calculating  L  and,  hence,  s,  is  developed  herein.  We  use  the 
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following  iterative  procedure.  The  following  values  are  used  for  the  various 
parameters  in  our  calculations, 
tparse  :  20  msecs 

tdir  :  calculated  using  equation  (2)  of  Section  A 

tdire  :  calculated  using  equation  (3)  of  Section  A 

tdirf  :  calculated  using  equation  (4)  of  Section  A 

tdirg  :  calculated  using  equation  (5)  of  Section  A 

tm  :  8  msecs 

T  :  18 

thus  :  .01  msecs 

tbus2  :  16  msecs 

tout  :  8  msecs 

tproc  :  10  msecs 

adgenl  :  calculated  using  equation  (1)  of  Appendix  F 
adgen  :  calculated  using  equation  (2)  of  Appendix  F 

The  number  of  back-ends  is  taken  from  the  set  {3,  6,  9}  and  the  number 
of  requests  in  the  system  is  taken  from  the  set  {5,  10,  15}. 

Assume  an  initial  value  for  L,  the  arrival  rate  of  requests  to  the  disk 
system  as  follows: 

(seek  time  +  rotation  time) 

Using  this  value  of  L,  calculate  the  value  of  s,  the  disk  system  service  time 
and  r,  the  disk  system  response  time  using  the  expressions  derived  in  Sections 
E.l  and  E.3,  respectively.  With  s,  the  closed  queueing  network  model  is  com¬ 
pletely  specified.  It  may  then  be  used  to  calculate  utilizations,  response 
times,  throughputs  and  average  queue  lengths  on  a  per  class  and  per  device 
manner.  In  particular,  the  response  time  of  the  disk  system  modelled  by  the 
closed  queueing  network  can  be  calculated.  The  value  of  the  disk  system  re¬ 
sponse  time  obtained  from  the  closed  queueing  network  model,  R  is  compared  to 
that  obtained  from  the  I/O  submodel,  r.  If  the  two  are  less  than  one  milli¬ 
second  apart,  the  iteration  stops.  Otherwise,  the  value  of  L  is  modified  to 
be  equal  to  the  throughput  Tp  of  the  disk  system  calculated  from  the  closed 
queueing  network  model.  The  iteration  procedure  is  now  repeated  until  the 
small  difference  between  R  and  r  is  obtained.  In  our  experiments  with  this 
iteration  technique,  we  found  that  no  more  than  three  iterations  were  needed. 
That  is,  with  the  chosen  initial  value  of  L,  the  convergence  was  extremely 
fast.  With  the  closed  queueing  network  models  for  the  various  directory 
processing  strategies  being  completely  specified,  we  now  present  the  results 
of  our  experiments.  The  results  for  strategies  A  to  D  are  presented  in 
Figures  23,  24,  25  and  26,  respectively.  Since  the  results  for  Strategies 
E  and  F  are  identical,  they  are  presented  in  Figure  2?.  Finally,  the  re¬ 
sults  for  Strategies  G  and  H  are  presented  in  Figures  28  and  29,  respec- 
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Figure  23.  Queueing  Network  Model  Results  for  Strategy  A 
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l-’lgure  24.  Queueing  Network  Model  Results  for  Strategy  B 
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Figure  25.  Queueing  Network  Model  Results  for  Strategy  C 
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Figure  26.  Queueing  Network  Model  Results  for  Strategy  D 
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Figure  27.  Queueing  Network  Model  Results  for  Strategies  E  and  F 
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20.972 

Number  of  Requests  =  15 
Number  of  Back-ends  =  6 
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Response 

Time 
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.2321 

6.077 

.0663 

1.428 

.0001 

0.002 

.1326 

3.074 

1.0000 

490.343 

.1528 

21.749 

.0249 

1.026 

.1482 

20.989 

Utilization 

Response 

Time 

(msecs) 

.2545 

3.933 

.0727 

.0001 

0.001 

.1454 

2.025 

.7442 

86.998 

.1010 

12.160 

.0182 

1.016 

.1625 

20.696 

Number  of  Requests  =  10 
Number  of  Back-ends  =  9 


Utilization 

Response 

Time 

(msecs) 

4.585 

0.980 

.0001 

0.001 

.1906 

2.183 

.9753 

216.819 

.1324 

12.768 

.0238 

1.024 

.2129 

22.545 

Number  of  Requests  *  15 
Number  of  Back-ends  *  9 


Utilization 

Response 

Time 

(msecs) 

.3417 

4.720 

.0976 

0.985 

.0001 

0.001 

.1952 

2.208 

.9992 

409.053 

.1357 

12.860 

.0244 

1.025 

.2182 

22.857 

Figure  28.  Queueing  Network  Model  Results  for  Strategy  G 
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Number  of  Requests  =  5 
Number  of  Back-ends  =  3 


Utilization 

Response 

Time 

(msecs) 

.1175 

10.552 

.0336 

2.758 

0.0000 

0.003 

.0671 

5.712 

.9963 

164.279 

.1546 

43.423 

.0252 

1.026 

Number  of  Requests  =  10 
Number  of  Back-ends  =  3 
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i-  - - 

Response 

Time 

(msecs) 

.1179 

.0337 

0.0000 

.0674 

1.0000 

.1552 

.0253 

10.581 

2.760 

0.003 

5.719 

361.281 

43.608 

1.026 

Number  of  Requests  =  15 
Number  of  Back-ends  =  3 
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Time 
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10.581 

.0337 

2.760 

0.0000 

0.003 

.0674 

5.719 

1.0000 

559.128 

.1552 

43.608 

.0253 

1.026 

Number  of  Requests  =  5 
Number  of  Back-ends  =  6 
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Time 
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.2148 
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.0614 

1.413 

.0001 

0.002 

.1228 

3.003 
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120.384 

.2172 

35.150 

.0230 
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Number  of  Requests  =  10 
Number  of  Back-ends  =  6 
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Time 

(msecs) 

.2318 

6.068 

.0662 

1.428 

.0001 

0.002 

.1325 

3.073 

.9988 

301.412 

.2344 

36.928 

.0248 

1.025 

Number  of  Requests  =  15 
Number  of  Back-ends  =  6 
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Time 
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6.077 

.0663 

1.428 

.0001 

0.002 

.1326 

3.074 
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36.983 
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Number  of  Requests  ■  5 
Number  of  Back-ends  •  9 
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.2769 
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0.001 
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Number  of  Requests  =  10 
Number  of  Back-ends  *  9 
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Time 
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.0001 

0.001 
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244.849 
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27.925 
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Number  of  Requests  *  15 
Number  of  Back-ends  =  9 
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Response 

Time 
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0.001 
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.9997 
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28.224 
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Figure  29.  Queueing  Network  Model  Results  for  Strategy  H 
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tively.  Each  of  these  figures  contains  two  columns,  one  for  the  utilization 
and  one  for  the  response  time.  These  two  measures  are  recorded  on  a  per-class- 
and-per-device  basis.  The  order  of  presentation  of  the  results  within  a  figure 
for  a  particular  strategy  is  the  same  as  the  order  of  presentation  of  the  ser¬ 
vice  time  and  visit  ratio  in  the  description  of  the  model  for  that  Strategy. 

For  instance,  in  the  description  of  the  model  for  Strategy  A,  we  first  pre¬ 
sented  the  service  time  and  visit  ratio  for  class  1  at  device  1.  Thus,  in 
Figure  23,  the  first  row  gives  the  utilization  and  response  time  for  class  1 
at  device  1.  In  reading  these  figures,  we  should  keep  in  mind  that  device  1 
is  the  controller,  device  2  is  the  bus,  device  3  is  a  disk  system  and  device 
4  is  a  back-end . 

We  also  present  a  comparative  set  of  results  for  the  eight  different  stra¬ 
tegies  in  Figure  30.  Each  entry  in  this  figure  represents  the  overall  response 
time  of  MDBS  using  one  of  these  eight  strategies.  Most  of  the  discussion  that 
follows  is  based  on  the  results  shown  in  this  figure.  However,  in  order  to 
explain  some  of  the  results  shown  in  this  figure,  we  will  have  to  refer  back 
to  the  results  presented  in  Figures  23  through  29. 

As  expected,  the  centralized  strategy  (i.e..  Strategy  A)  is  the  worst  of 
the  lot.  It  is  consistently  the  worst  strategy  over  the  entire  range  of  val¬ 
ues  for  the  number  of  requests  and  the  number  of  back-ends.  There  are  situa¬ 
tions  when  the  response  time  obtained  by  use  of  the  centralized  strategy  is 
almost  50%  higher  than  when  using  some  of  the  other  strategies.  For  example, 
when  the  number  of  back-ends  is  taken  as  nine  and  the  number  of  requests  is 
fifteen,  the  response  time  of  the  centralized  strategy  is  1255.16  msecs.  On 
the  other  hand,  the  response  time  using  Strategy  E,  the  fully  duplicated  stra¬ 
tegy,  is  only  810  msecs.  The  reason  for  the  poor  performance  of  the  central¬ 
ized  strategy  is,  as  we  have  expected,  owing  to  the  fact  that  the  controller 
is  becoming  a  bottleneck.  For  the  particular  numbers  of  requests  and  numbers 
of  back-ends  under  consideration,  the  use  of  Strategy  E  will  result  in  a  mere 
35%  utilization  of  the  controller.  This  figure  is  obtained  by  adding  the  first 
two  entries  in  the  utilization  column  of  Figure  27.  On  the  other  hand,  the 
use  of  the  centralized  strategy  lead  to  a  controller  utilization  as  high  as  90% 
which  is  obtained  by  adding  the  first  two  entries  in  the  utilization  column  of 
Figure  23. 

Hie  partially  centralized  strategy  (i.e.,  Strategy  B)  was  consistently 
better  than  the  centralized  strategy.  The  response  time  improvement  of  the 
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RESPONSE  TIME  TABLES 
(In  msecs) 


Number  of  requests  =  5 


Strate¬ 

gies 

if  of 

back-ends 

A 

B 

C 

D 

E 

F 

G 

H 

3 

1173.7067 

1111.5368 

1106.0955 

1093.1209 

1076.7827 

1076.7826 

1055.9013 

1087.1987 

6 

617.087 

503.2648 

490.2142 

484.078 

428.6259 

428.6259 

421.2351 

458.5127 

9 

467.0594 

334.3705 

315.5591 

313.2972 

263.8941 

263.8941 

249.3356 

284.3019 

Number  of  requests  =»  10 


Strate¬ 

gies 

a  of 

back-ends 

A 

B 

C 

D 

E 

F 

G 

H 

3 

2356.7506 

2295.8904 

2288.7749 

2277.5511 

2215.9679 

2215.9679 

2237.8885 

2269.5116 

6 

1194.2616 

1080.4319 

1068.6451 

1061.9707 

905.9084 

905.9084 

956.0146 

1005.6126 

9 

858.1617 

701.0841 

682.5620 

681.4713 

509.5228 

509.5228 

517.1716 

588.9918 

Number  of  requests  «  15 


Strate¬ 

gies 

#  of 

back-ends 

A 

B 

C 

D 

E 

F 

G 

H 

3 

3543.8065 

3482.9716 

3475.8566 

3464.6326 

3339.5784 

3339.5784 

3424.9681 

3456.5894 

6 

1792.7510 

1683.2213 

1671.5949 

1664.8886 

1424.3426 

1424.3426 

1557.2938 

1607.0621 

9 

1255.1678 

1108.0571 

1090.8745 

1089.6635 

810.6118 

810.6118 

903.2192 

985.4986 

Figure  30.  MDBS  Response  Times  Under  Various 
Directory  Management  Strategies 
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partially  centralized  strategy  over  the  centralized  strategy  is  most  evident 
when  the  number  of  back-ends  is  very  large.  For  example,  when  the  number  of 
back-ends  is  nine  and  the  number  of  requests  is  five,  the  centralized  strategy 
has  a  response  time  of  467.06  msecs  and  the  partially  centralized  strategy  has 
a  response  time  of  334.37  msecs  which  is  an  improvement  of  about  40%.  The 
reason  that  the  improvement  is  more  visible  at  larger  number  of  back-ends  is 
because  this  is  precisely  when  the  controller  in  the  centralized  strategy  be¬ 
comes  highly  utilized.  In  other  words,  the  partially  centralized  strategy  is 
a  more  extensible  strategy  than  the  centralized  strategy.  Comparing  corre¬ 
sponding  entries  in  Figures  23  and  24,  we  see  that  the  controller  utilization 
under  the  partially  centralized  strategy  is  1/2  the  controller  utilization 
under  the  centralized  strategy. 

Next,  let  us  consider  the  results  of  using  the  rotating  strategy,  (i.e., 
Strategy  C) .  As  we  recall,  the  rotating  strategy  is  an  improvement  over  the 
partially  centralized  strategy  because  the  descriptor  processing  is  shared 
by  the  back-ends  and  the  controller,  instead  of  being  done  entirely  in  the 
controller.  The  results  bear  out  the  fact  that  the  rotating  strategy  is  in¬ 
deed  superior  to  the  partially  centralized  strategy,  since  it  consistently  out¬ 
performs  the  partially  centralized  strategy,  in  terms  of  response  time,  over 
the  entire  range  of  values  for  the  number  of  back-ends  and  the  number  of  re¬ 
quests.  The  improvements  are  not  dramatic.  The  improvement  in  average  re¬ 
sponse  time  is  in  the  order  of  ten  or  twenty  milliseconds.  The  largest  improve¬ 
ment  occurs  when  the  number  of  back-ends  is  the  largest.  This  implies,  of 
course,  that  the  rotating  strategy  is  more  extensible  than  the  partially  cen¬ 
tralized  strategy  and,  hence,  more  preferable  to  us  as  designers  of  an  exten¬ 
sible  system. 

Next,  let  us  consider  the  results  of  using  the  rotating  without  controller 
strategy  (Strategy  D) .  The  rationale  of  using  this  strategy  was  that  it  might 
provide  some  improvement  over  the  rotating  strategy  because  of  the  fact  that 
it  serves  to  alleviate  the  controller  limitation  problem  to  a  greater  degree. 
This  is  because  the  controller  is  no  longer  involved  in  descriptor  processing. 
The  results  of  Figure  30  bear  this  out  to  some  degree.  In  fact,  the  rotating 
without  controller  strategy  provides  a  better  response  time  than  the  rotating 
strategy  over  the  entire  range  of  values  for  the  number  of  requests  and  the 
number  of  back-ends.  However,  the  improvement  of  this  strategy  over  the  ro¬ 
tating  strategy  is  only  marginal.  Thus,  it  is  seen  that  the  improvement  pro- 
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vided  by  this  strategy  over  the  rotating  one  is  never  more  than  about  ten 
milliseconds.  Furthermore,  the  percentage  improvement  in  response  time  re¬ 
mains  fairly  constant  over  the  entire  range  of  values  of  the  two  parameters 
being  varied.  Thus,  the  percentage  improvement  is  (1104.09  -  1093.12)/1104.09, 
when  the  number  of  back-ends  is  three  and  the  number  of  requests  is  five.  For 
the  same  number  of  requests  and  six  back-ends,  the  percentage  improvement  is 
(490.2142  -  484. 0788)/490. 2142.  Both  these  represent  an  improvement  of  about 
11%.  A  look  at  the  controller  utilizations  in  Figures  25  and  26  tells  us  that 
the  controller  is  indeed  a  little  less  utilized  in  the  rotating  without  con¬ 
troller  strategy.  For  instance,  the  controller  utilization  when  the  number  of 
back-ends  is  three  and  the  number  of  requests  is  five  for  the  rotating  stra¬ 
tegy  is  16.17%.  On  the  other  hand,  the  controller  utilization  in  the  rotating 
without  controller  strategy  for  the  same  values  of  number  of  back-ends  and 
number  of  requests  is  15.13%. 

Let  us  now  consider  the  results  of  the  fully  duplicated  strategy  (Stra¬ 
tegy  E) .  As  we  recall,  this  strategy,  along  with  the  next  two  strategies,  are 
the  three  strategies  that  employ  parallel  descriptor  processing  during  the  des¬ 
criptors  search  phase.  Our  results  indicate  that  such  a  strategy  is  likely  to 

be  better  than  all  the  strategies  considered  thus  far.  Comparing  it  with  the 

best  strategy  considered  so  far  (the  rotating  without  controller  strategy) , 
we  see  that  the  fully  duplicated  strategy  can  lead  to  as  much  as  34%  lower 
response  time.  This  happens  when  the  number  of  back-ends  is  nine  and  the  num¬ 
ber  of  requests  is  15.  For  these  particular  number  of  back-ends  and  number  of 

requests,  the  response  time  using  the  fully  duplicated  strategy  is  810.6118 

msecs  whereas  the  response  time  using  the  rotating  without  controller  strategy 
is  1089.6635  msecs.  The  results  also  indicate  that  the  improvement  of  the 
fully  duplicated  strategy  over  the  rotating  without  controller  strategy  becomes 
more  evident  at  larger  number  of  back-ends  and  larger  number  of  requests.  In 
other  words,  the  fully  duplicated  strategy  is  more  extensible  than  the  other 
strategies  we  have  considered  so  far. 

The  utilization  of  the  disk  system  plays  an  important  part  in  the  disparity 
in  the  response  time  between  these  two  strategies.  The  utilization  of  the 
disks  in  the  rotating  without  controller  strategy  (the  fifth  row  of  each  table 
in  Figure  26)  is  a  little  higher  than  in  the  fully  duplicated  strategy  (See 
the  fifth  row  of  each  table  in  Figure  27).  However,  a  small  increase  in  the 
disk  utilization  can  Increase  the  overall  response  time  by  a  large  amount  owing 
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to  the  fact  that  the  disk  service  time  is  large.  This  is  one  of  the  causes 
of  the  better  response  time  in  the  fully  duplicated  strategy.  Another  reason 
for  the  better  response  time  is  the  fact  that  the  descriptor  search  phase  now 
takes  a  shorter  length  of  time.  For  instance,  when  the  number  of  back-ends 
is  nine  and  the  number  of  requests  is  five,  the  descriptor  search  time  in  the 
rotating  without  controller  strategy  is  18.247  msecs  (see  last  row  of  cor¬ 
responding  table  in  Figure  26)  while  it  is  only  14.975  msecs  (see  last  row  of 
corresponding  table  in  Figure  27)  in  the  fully  duplicated  strategy.  This  was 
the  reason  why  we  had  expected  the  fully  duplicated  strategy  to  perform  better 
than  the  rotating  without  controller  strategy.  The  fact  that  the  disk  system 
response  time  is  also  improved  is  unexpected. 

The  utilization  of  the  controller  is  also  seen  to  be  a  little  better  in 
the  case  of-  the  fully  duplicated  strategy.  Thus,  when  the  number  of  back-ends 
is  nine  and  the  number  of  requests  is  five,  the  total  controller  utilization 
is  40.09%  if  the  rotating  without  controller  strategy  is  used,  while  it  is 
only  35.02%  if  the  fully  duplicated  Strategy  is  used.  This  also  leads  to  some 
improvement  in  total  response  time. 

The  results  for  the  descriptors  division  by  attribute  strategy  (Strategy  F) 
are  exactly  the  same  as  for  the  fully  duplicated  strategy  (Strategy  E) . 

Next,  let  us  consider  the  results  for  the  descriptors  division  within  at¬ 
tribute  strategy  (Strategy  G) .  We  recall  that  this  strategy  also  employs 
parallel  descriptor  processing.  For  small  number  of  requests,  this  strategy 
performs  marginally  better  than  the  fully  duplicated  strategy.  For  large  num¬ 
ber  of  requests,  however,  the  fully  duplicated  strategy  outperforms  the  des¬ 
criptors  division  within  attribute  strategy.  Furthermore,  the  fully  duplicated 
strategy  outperforms  the  descriptors  division  within  attribute  strategy  the 
most,  when  the  number  of  back-ends  is  the  largest.  Thus,  when  the  number  of 
back-ends  is  nine  and  the  number  of  requests  is  fifteen,  the  response  time 
under  the  fully  duplicated  strategy  is  810.62  msecs  and  the  response  time  under 
the  descriptors  division  within  attribute  strategy  is  903.2192  msecs.  This  is 
an  improvement  of  10.3%. 

Finally,  let  us  consider  the  results  of  the  fully  replicated  strategy 
(Strategy  H) .  The  results  indicate  that  it  is  inferior  to  the  three  strategies 
which  employ  parallel  descriptor  processing  and  superior  the  the  other  four 
strategies. 

Consider,  once  again,  the  results  of  Figure  30  for  the  fully  duplicated 
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strategy  (Strategy  E) .  When  the  number  of  requests  is  five  and  the  number  of 
back-ends  is  three,  the  response  time  is  1076.78  msecs.  When  the  number  of 
back-ends  is  increased  to  six  and  the  number  of  requests  is  kept  unchanged, 
the  response  time  improves  to  428.63  msecs.  That  is,  when  the  number  of  back¬ 
ends  increases  to  twice  of  its  original  number,  the  response  time  of  MDBS 
improves  better  than  one  half  of  its  original  time.  This  is  a  surprising  re¬ 
sult.  Our  data  placement  strategy  only  ensures  us  that  the  doubling  of  the 
number  of  back-ends  may  halve  the  number  of  tracks  to  be  retrieved  at  each 
back-end.  Thus,  we  have  expected  that,  in  the  best  case,  a  doubling  of  the 
number  of  back-ends  would  lead  to  a  halving  of  the  response  time.  Hence,  the 
results  of  Figure  30  are  better  than  expected .  An  examination  of  the  results 
in  Figure  27  reveals  to  us  the  reason  for  the  better  than  expected  improvement 
in  response  time.  Once  again,  it  was  the  response  time  of  the  disk  system 
that  played  an  important  role  in  this  unexpected  result.  When  the  number  of 
back-ends  is  increased  as  indicated,  it  turns  out  that  the  disk  utilization 
decreases  from  99.72%  to  91.53%.  As  a  result,  the  disk  system  response  time 
decreases  from  164.72  msecs  per  request  to  115.21  msecs  per  request.  Thus, 
not  only  is  the  number  of  tracks  to  be  retrieved  decreased,  so  is  the  time  to 
retrieve  each  track.  This  explains  the  greater- than-expected  decrease  in  re¬ 
sponse  time.  Similar  reasons  account  for  the  fact  that  when  the  number  of 
back-ends  is  tripled  to  nine  while  keeping  the  number  of  requests  constant  at 
five,  the  response  time  is  shortened  to  263.8941  msecs  which  is  24.5%  of  the 
original  time  of  1076.7827  msecs  (we  have  expected  that  the  response  time  can 
be  no  better  than  33%  of  what  it  was  when  the  number  of  back-ends  was  three). 
In  other  words,  by  choosing  such  a  strategy  for  directory  management  in  MDBS, 
we  expect  that  an  increase  in  the  number  of  back-ends  in  MDBS  by  a  factor  of 
n  will  cause  an  improvement  in  response  time  which  will  be  better  than  a 
factor  of  n. 

In  conclusion,  the  results  indicate  that  the  fully  duplicated  strategy 
(Strategy  E)  and  the  descriptors  division  by  attribute  strategy  (Strategy  F) 
are  the  best  strategies  over  a  wide  range  of  values  for  the  number  of  requests 
and  the  number  of  back-ends.  The  choice  is  now  between  these  two  strategies. 
The  advantage  of  the  latter  strategy  is  that  it  would  occupy  less  storage 
space,  since  descriptors  are  not  duplicated.  However,  in  the  following  sec¬ 
tion,  we  will  study  the  difference  between  these  two  strategies  in  terms  of 
storage  requirements  for  typical  number  of  directory  attributes  and  number  of 


i 

I 

tt 


-145  - 


descriptors  per  attribute.  Furthermore,  in  another  section,  we  will  examine 
whether  the  duplication  of  descriptors,  as  in  the  fully  duplicated  strategy, 
is  necessary  for  efficient  handling  of  update  requests. 

4.2.4  Storage  Requirements  of  Directory  Management  Strategies 

In  this  section,  we  shall  compare  the  eight  strategies  on  the  basis  of 
their  storage  requirements.  As  we  have  already  indicated,  two  types  of  di¬ 
rectory  based  tables  are  necessary  in  MDBS.  The  first  type  of  table  is  the 
descriptor-to-descriptor-id  table  (DDIT)  and  the  second  type  of  table  is  the 
augmented  cluster  definition  table  (CDT) .  We  shall  discuss  for  each  strategy 
the  amount  of  storage  needed  for  both  types  of  tables,  in  turn. 

Primarily,  the  eight  different  directory  management  strategies  differ  in 
the  amount  of  storage  needed  for  storing  descriptor-to-descriptor-id  tables. 
Except  for  the  centralized  strategy,  there  is  no  difference  among  the  different 
strategies  in  the  amount  of  storage  needed  for  storing  the  augmented  cluster 
definition  tables.  Since  the  size  of  the  augmented  CDTs  is  much  larger  than 
the  size  of  the  descriptor-to-descriptor-id  tables,  it  is  expected  that  there 
will  be  no  significant  difference  among  the  different  strategies  in  terms  of 
storage  requirements.  Let  us  examine  them  more  carefully. 

A.  Size  Estimation  of  the  Descriptor-to-Descriptor-Id  Tables  (DDITs) 

We  assume  that  a  database  has  i  directory  attributes  and  that  there  are 
D  descriptors  per  attribute.  Let  k  be  the  number  of  descriptors  that  can  be 
stored  in  a  page  of  b  bytes.  All  strategies,  except  strategies  F  and  G,  re¬ 
quire  the  following  amount  of  storage  for  storing  the  descriptor  tables. 

S  =  u(^)bi  bytes 

where,  u(a)  stands  for  the  nearest  integer  equal  to  or  greater  than  a.  In 
Strategies  A  and  B,  DDITs  are  stored  entirely  at  the  controller.  Thus, 
the  size  of  the  tables  is  S  bytes.  In  Strategy  C,  these  tables  are 
duplicated  in  the  controller  and  at  all  the  back-ends.  Thus,  a  total  of 
(n+l)S  bytes  are  required  by  these  tables.  In  Strategies  D,  E  and  H,  the 
tables  are  duplicated  in  all  the  back-ends.  Hence,  in  these  strategies,  the 
total  storage  required  by  the  tables  is  nS  bytes.  In  Strategy  F,  the  total 
space  occupied  by  the  descriptor  tables  is: 

u  A  bn  u(— )  bytes  . 
k  n 


I 
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Finally,  for  Strategy  G,  the  storage  needed  Is 

u(^)bin  bytes. 

B.  Size  Estimation  of  the  Augmented  Cluster  Definition  Table  (CDT) 

We  recall  that  logically  the  augmented  CDT  consists  of  cluster  defi¬ 
nitions  and  corresponding  secondary  memory  addresses.  We  assume  that  the 
augmented  CDT  is  physically  implemented  as  follows.  Consider  a  database 
with  i  directory  attributes  and  D  descriptors  per  attribute.  Then  the 
augmented  CDT  consists  of  ID  entries.  Each  entry  is  formed  with  a  descrip¬ 
tor  id  followed  by  a  list  of  cluster  definitions  and  their  corresponding 
secondary  addresses.  Now,  the  number  of  bytes  needed  to  represent  a  cluster 

"  .dp.. 

Here,  lg  stands  for  the  logarithm  of  base  two.  In  the  ensuing  discussion, 
we  will  assume  that  t  bytes  are  needed  to  store  a  track  address  and  that  p 
bytes  are  needed  to  indicate  a  back-end  number.  Finally,  we  assume  that  the 
average  cluster  size  is  c  tracks,  where  c?l.  Thus,  the  size,  X,  of  each 
CDT  entry  for  the  centralized  strategy  is  given  by 

u(^jp.)  +  Di_1(u(i|S)i  +  c(t  +  p)) 

Thus,  the  total  size  of  the  augmented  CDT  in  the  centralized  strategy  is 
given  by  iDX. 

In  the  case  of  every  other  strategy,  the  size  of  an  augmented  CDT  entry  is 

u(i*p)  +  —  1(u(^)i  +  t),  if  c<n; 

u(^|2-)  +  D±_1(u(i|2.)i  +  — ),  otherwise, 
o  on 

If  we  denote  the  size  of  an  augmented  CDT  entry  for  all  strategies  except 
the  centralized  one  as  Y,  the  total  size  of  CDT  is  niDY  for  all  strategies 
except  the  centralized  one. 

The  following  table  shows  the  results  of  our  study  for  the  combined  size 
of  both  tables.  In  the  sequel,  we  shall  refer  to  the  combined  size  of  the 
DDITs  and  the  augmented  CDT  as  the  directory  size. 


STRATEGY 


DIRECTORY  SIZE 


A 

S  +  iDX 

B 

S  +  niDY 

C 

(n+l)S  +  niDY 

D 

nS  +  niDY 

E 

nS  +  niDY 

F 

u(p)bn  u(— )  +  niDY 
k  n 

G 

u(4-)bin  +  niDY 
nk 

H 

nS  +  niDY 

C.  Interpretation  of  the  Results  on  Sizes 

Let  us  now  proceed  to  perforin  some  actual  calculations  on  the  directory 
sizes.  We  let  i,  the  number  of  directory  attributes,  be  5,  b,  the  page  size, 
be  512  bytes,  k,  the  number  of  descriptors  per  page,  be  85,  t,  the  number  of 
bytes  in  a  track  address,  be  4;  and  p,  the  number  of  bytes  to  represent  a 
back-end  number,  be  1.  The  number  of  back-ends  is  taken  from  the  set  {2,  5,  8}, 
the  cluster  size  c  is  taken  from  the  {1,  2,  3,  4},  and  the  number  of  descrip¬ 
tors  D  is  taken  from  the  set  {10,  15,  20}. 

The  results  are  shown  in  Figure  31.  The  results  in  this  figure  show  the 
directory  sizes  in  K  bytes  for  various  number  of  descriptors  per  attribute, 
size  of  a  cluster  and  number  of  back-ends. 

Strategy  A  has  the  smallest  storage  requirements  when  the  size  of  a  cluster 
is  large.  However,  it  has  the  largest  storage  requirements  when  the  size  of  a 
cluster  is  small.  Because  of  its  very  large  worst  case  storage  requirements. 
Strategy  A  is  not  preferable  for  implementation  in  MDBS.  Let  us  now  consider 
the  storage  requirements  for  the  remaining  seven  strategies.  It  is  clear  that 
there  is  no  significant  difference  in  the  storage  requirements  for  these  seven 
strategies.  In  fact,  the  largest  difference  in  storage  requirements  among  these 
seven  strategies  is  .05%. 

Since  there  is  no  'superior*  strategy  in  terms  of  storage  requirements,  the 
strategy  chosen  for  Implementation  in  MDBS  should  be  the  one  which  is  found  to 
be  superior  in  terms  of  performance.  Our  results  have  shown  that  Strategies 
E  and  F  were  the  superior  ones  in  terms  of  performance.  In  the  following 
section,  it  will  be  seen  that  the  duplication  of  descriptors,  as  in  Strategy  E, 
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is  necessary  for  efficient  handling  of  some  kinds  of  update  requests.  Thus, 
we  choose  to  adopt  Strategy  E  for  directory  management  in  MDBS. 

4.3  The  Entire  Process  of  Request  Execution 

In  the  previous  section,  we  described  the  process  of  directory  management 
in  MDBS.  In  this  final  section,  we  shall  discuss  the  entire  sequence  of  actions 
performed  by  MDBS  in  processing  the  four  different  types  of  requests.  We  shall 
discuss  each  type  of  request,  in  turn. 

4.3.1  Executing  a  Retrieve  Request 

We  recall  that  the  syntax  of  a  retrieve  request  in  MDBS  is  as  follows. 

RETRIEVE  Query  Target-list  [BY  Attribute] [WITH  Pointer] 

The  sequence  of  actions  taken  by  MDBS  in  order  to  execute  a  retrieve  has  al¬ 
ready  been  described  in  some  detail  in  an  earlier  section.  We  shall  repeat 
some  of  the  discussion  here,  for  completeness. 

The  controller  will  first  parse  the  request  and  determine  that  it  is  a 
retrieve  request.  Next,  the  controller  will  broadcast  the  request  to  all  the 
back-ends.  The  back-ends  will  perform  descriptor  processing  and  address  gener¬ 
ation  under  Strategy  E  as  described  in  the  previous  section.  Upon  completion, 
each  back-end  has  a  list  of  secondary  memory  addresses  of  the  tracks  which 
contain  the  relevant  records.  These  tracks  are  accessed  by  the  back-end.  The 
query  in  the  request  is  used  to  select  the  records  from  these  tracks.  First, 
the  records  satisfying  the  query  are  selected.  If  a  BY-clause  is  specified  in 
the  retrieve  request,  the  selected  records  are  grouped  by  the  values  of  the  at¬ 
tribute  in  the  BY-clause.  If  no  BY-clause  is  specified  in  the  retrieve  request, 
all  the  selected  records  are  treated  as  a  single  group.  Next,  for  each  group 
of  selected  records,  the  values  of  all  attributes  in  the  target-list  are  ex¬ 
tracted  from  the  records  of  the  group.  If  no  aggregate  operator  is  specified 
on  an  attribute  in  the  target-list,  the  extracted  values  of  the  group 
are  returned  to  the  controller.  If  an  aggregate  operator  is  specified  on  an 
attribute  in  the  target-list,  some  computation  is  performed  on  all  the  attri¬ 
bute  values  In  the  records  of  the  group  and  a  single  aggregate  value  is  re¬ 
turned  to  the  controller.  This  completes  the  actions  performed  by  a  back-end 
on  each  group  of  selected  records.  If  a  WITH-clause  is  specified  in  the  re¬ 
trieve  request,  the  secondary  memory  addresses  of  all  selected  records  must 
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also  be  sent  to  the  controller  by  each  back-end*. 

The  controller  will  wait  for  responses  from  all  the  back-ends.  Upon  re¬ 
ceiving  all  the  responses  (i.e.,  attribute  values,  aggregate  values  or  ad¬ 
dresses)  from  all  back-ends,  the  controller  will  forward  these  responses  to 
the  user  that  issued  the  retrieve  request.  This  completes  the  execution  of 
the  retrieve  request. 

4.3.2  Executing  a  Delete  Request 

As  we  recall,  the  syntax  of  a  delete  request  is 

DELETE  Query 

The  execution  of  this  request  in  MDBS  is  similar  to  the  execution  of  a  re¬ 
trieve  request.  The  controller  will  first  parse  the  request  and  determine 
that  it  is  a  delete  request.  Next,  the  controller  will  broadcast  the  request 
to  all  back-ends.  The  back-ends  will  perform  descriptor  processing  and  ad¬ 
dress  generation  under  Strategy  E.  Upon  completion,  each  back-end  haa  a  list 
of  secondary  memory  addresses  of  tracks  which  contain  relevant  records.  Re¬ 
cords  of  these  tracks  are  retrieved  from  the  secondary  memory  by  respective  back¬ 
ends,  The  query  in  the  delete  request  is  used  to  select  the  records  which  are 
to  be  deleted.  These  selected  records  are  then  marked  for  deletion.  The  track 
space  occupied  by  the  marked  records  is  not  immediately  recovered.  Such  re- 
c  >very  of  space  will  be  done  during  database  reorganization  time  (see  Chapter 
7).  After  the  records  are  marked,  the  marked  records  are  written  back  to  the 
same  tracks  by  each  back-end.  If  all  the  records  in  a  track  are  marked  for 
deletion,  the  address  of  this  track  is  removed  from  all  entries  in  which  it 
appears  in  the  augmented  cluster  definition  table  (CDT).  Finally,  each  back¬ 
end  will  send  an  acknowledgement  to  the  controller  to  indicate  that  it  has 
finished  executing  the  delete  request.  Upon  receiving  the  acknowledgements 
from  all  the  back-ends,  the  controller  will  inform  the  user  or  user  program 
that  the  delete  request  has  successfully  been  completed. 

4.3.3.  Executing  an  Update  Request 

The  syntax  of  an  update  request  in  MDBS  Is  as  follows 

UPDATE  Query  Modifier 

We  recall  that  the  modifier  in  an  update  request  specifies  the  new  value  to 
be  taken  by  the  attribute  being  modified  and  that  it  may  be  one  of  the  types 
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described  below. 

Type-0  :  <attribute  *  constant> 

Type-I  :  <attribute  «  f (attrlbute)> 

Type-II  :  <attribute  *  f (attribute-l)> 

Type-III  :  <attribute  =  f (attribute-1)  of  Query> 

Type-IV  :  <attribute  *  f (attribute-1)  of  Pointer> 

In  the  simplest  case,  a  modifier  indicates  the  new  value  to  be  taken  by  the 
attribute  being  modified  (i.e. ,  type-0).  In  the  more  involved  cases,  the  mo¬ 
difiers  specify  the  new  value  to  be  taken  by  the  attribute  being  modified  as 
a  function  f  of  the  'old'  value  of  that  attribute  (i.e.,  type-I)  or  values  of 
some  other  attribute  of  the  record  to  be  updated  (e,g.,  types  II,  III  or  IV). 

The  other  attribute  is  called  the  base  attribute  (i. e. , attribute-1  in  the 
specification).  We  will  first  describe  the  execution  of  an  update  request 
containing  modifiers  of  types  0,  I  and  II.  We  will  then  describe  the  ex¬ 
ecution  of  an  update  request  containing  modifiers  of  type-III  or  IV. 

An  update  request  containing  a  modifier  of  types  0,  I  or  II  is  broad¬ 
cast  by  the  controller  to  all  the  back-ends.  The  back-ends  will  perform 
descriptor  processing  and  address  generation  under  Strategy  E.  Afterwards, 
each  back-end  has  a  list  of  secondary  memory  addresses  of  the  tracks  contain¬ 
ing  the  relevant  records.  These  tracks  are  accessed  by  respective  back¬ 
ends  and  the  records  satisfying  the  query  are  selected  from  these  tracks. 

These  are  the  records  to  be  updated. 

Each  of  these  records  is  updated  using  the  modifier  in  the  update  request. 
If  the  modifier  is  of  type-0,  the  new  value  to  be  taken  by  the  attribute  being 
modified  in  a  record  to  be  updated  is  provided  in  the  modifier.  If  the  modifier 
is  of  type-I,  the  new  value  to  be  taken  by  the  attribute  (being  modified)  in  a 
record  (to  be  updated)  is  computed  as  a  function  (specified  in  the  modifier) 
of  the  value  of  the  same  attribute.  Finally,  if  the  modifier  is  of  type-II, 

the  new  value  to  be  taken  by  the  attribute  (being  modified)  in  a  record  (to 
be  updated)  is  computed  as  a  function  f  (specified  in  the  modifier)  of  the 
value  of  the  base  attribute  in  that  record. 

Due  to  its  change  in  attribute  values,  an  updated  record  may  remain  in 
the  same  cluster  to  which  it  (more  precisely,  its  pre-updated  version)  be¬ 
longed  or  It  may  now  belong  to  a  different  cluster.  In  the  latter  case,  a 
record  is  said  to  change  cluster.  Recall  that  a  cluster  is  a  group  of  re¬ 
cords  such  that  every  record  in  the  cluster  is  derived  from  the  same  set  of 
descriptors.  Thus,  an  updated  record  will  belong  to  a  different  cluster  only 
if  the  set  of  descriptors  from  which  it  is  derived  is  different  from  the  set 
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of  descriptors  from  which  the  pre-updated  version  was  derived.  If  the  attri¬ 
bute  being  modified  in  an  updated  record  is  not  a  directory  attribute,  the 
updated  record  continues  to  be  derived  from  the  same  set  of  descriptors,  since 
only  directory  attributes  affect  the  descriptors.  Hence,  the  updated  record 
does  not  change  cluster.  If  the  attribute  being  modified  is  a  directory  attri¬ 
bute,  an  updated  record  may  change  cluster.  Consider,  for  example,  a  database 
with  the  descriptors  as  depicted  in  Figure  14.  Consider  also  the  following 
record 

(<Location,NAPA>,  <Number , 32123> ,  <Balance,50>) . 

This  record  is  clearly  derived  from  the  descriptor  set  {D2,  D3,  D5}  (see 
Figure  14  again).  After  updating  the  value  of  the  Balance  which  is  a  directory 
attribute  in  this  record,  the  updated  record  is  now  shown  below. 

(<Location,NAPA> ,  <Number,32123> ,  <Balance,100>) 

By  the  way,  this  update  uses  type  0  modifier.  The  updated  record  is  also  de¬ 
rived  from  the  same  descriptor  set  {D2,  D3,  D5).  Hence,  the  record  does  not 
change  cluster.  On  the  other  hand,  if  the  value  of  the  Balance  is  changed  to 
600,  the  newly  updated  record  is  not  derived  from  the  descriptor  set  {D2,  D3, 
05} .  Instead,  it  is  derived  from  the  set  {D 2 ,  04,  D5}.  Hence,  the  record 
changes  cluster. 

In  order  to  check  whether  or  not  a  newly  updated  record  changes  cluster, 
it  is  necessary  for  a  back-end  to  search  the  descriptor-to-descriptor-id  table 
(DDIT) .  To  facilitate  such  search,  we  have  decided  that  each  back-end  should 
replicate  the  descriptors  for  all  the  directory  attributes  in  its  secondary 
memory.  This  is  an  additional  reason  that  we  decided  on  Strategy  E.  For 
example,  if  Strategy  F  were  employed,  a  back-end  would  need  to  access  descrip¬ 
tors  placed  at  other  back-ends  in  order  to  determine  clusters  of  updated  re¬ 
cords.  Consequently,  bus  traffic  will  be  excessive  which  would  create  the 
so-called  control  message  problem. 

Finally,  each  back-end  will  send  an  acknowledgement  to  the  controller  to 
indicate  that  it  has  finished  processing  the  update  request.  When  it  has 
received  acknowledgements  from  all  back-ends,  the  controller  will  output  a 
message  to  the  user  to  signal  successful  completion  of  the  update  request. 

This  completes  the  processing  of  an  update  request  containing  modifiers  of 
type-0,  I  or  II. 

Now,  let  us  describe  the  execution  of  an  update  request  containing  a 
type-IIT  or  IV  modifier.  In  this  case,  another  record  must  first  be  retrieved 
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by  MDBS  on  the  basis  of  a  user-provided  query  or  pointer.  After  the  re¬ 
cord  is  retrieved,  the  controller  will  extract  the  base  attribute  value  v 
from  the  retrieved  record.  It  will  then  compute  the  function  f  (specified 
in  the  type-III  or  IV  modifier)  on  the  value  v  and  thus  obtain  a  new  value 
v'.  The  controller  will  then  form  a  type-0  modifier  of  the  form 

<a  =  v'> 

where  a  is  the  attribute  being  modified,  i.e.,  the  same  attribute  appeared  to 
the  left  of  the  equality  sign  in  the  type-III  or  IV  modifier.  The  original 
type-III  or  IV  modifier  in  the  update  request  is  now  replaced  with  this  newly 
created  type-0  modifier.  In  other  words,  MDBS  converts  an  update  request  con¬ 
taining  a  type-III  or  IV  modifier  to  an  update  request  containing  a  type-0 
modifier.  This  update  request  containing  a  type-0  modifier  may  now  be  exe¬ 
cuted  in  the  same  manner  described  previously. 

4.3.4  Executing  an  Insert  Request 

The  syntax  of  an  insert  request  in  MDBS  is 

INSERT  Record 

The  controller  will  first  parse  the  request  and  determine  that  it  is  an  insert 
request.  Next,  the  controller  will  broadcast  the  request  to  all  the  back¬ 
ends.  The  back-ends  will  perform  descriptor  processing  under  Strategy  E.  The 
descriptor  search  phase  under  Strategy  E  for  record  insertion  is  more  spe¬ 
cialized  than  the  descriptor  search  phase  under  the  same  strategy  for  other 
requests.  For  example,  for  the  retrieve  request,  the  descriptor  search 
phase  involves  parallel  descriptor  processing  of  multiple  predicates  of  a 
given  query  conjunction.  For  the  insert  request,  the  descriptor  search  phase 
merely  involves  parallel  descriptor  processing  of  multiple  attribute-value 
pairs  (i.e.,  keywords)  of  the  record  to  be  inserted.  The  specialized  descrip¬ 
tor  processing  for  record  insertion  tends  to  simplify  the  processing  effort. 
More  specifically,  if  the  record  for  insertion  contains  x  directory  attribu- 
tes,  each  back-end  will  determine  the  corresponding  descriptors  for  —  attri¬ 
bute-value  pairs.  At  the  end  of  the  descriptor  search  phase,  the  single  clus¬ 
ter  to  which  the  record  to  be  inserted  is  known  to  the  back-end (s)  whose  se¬ 
condary  memory (memories)  has(have)  been  accommodating  the  cluster.  The  reason 
that  more  than  one  back-end  may  be  involved  in  accommodating  the  cluster  in 
consideration  is  that  the  cluster  being  sufficiently  large  has  been  evenly 
distributed  by  the  data  placement  strategy  over  several  back-ends'  secondary 
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memories  at  the  database-creation  time.  Consequently,  MDBS  must  decide 
which  back-end's  secondary  memory  is  to  be  used  for  accommodating  the 
new  record. 

The  address  generation  phase  for  record  insertion  requires  an  addi¬ 
tional  step  under  Strategy  E.  Instead  of  generating  the  secondary  memory 
address  immediately  upon  the  completion  of  descriptor  processing,  each 
back-end  sends  at  most  one  cluster  id  of  the  corresponding  descriptors  to 
the  controller.  By  consulting  a  cluster- id-to-next-back-end  table  (CINBT), 
the  controller  can  select  the  secondary  memory  of  a  specific  back-end  for 
record  insertion.  The  CINBT  which  is  depicted  in  Figure  32  is  created  for 
the  controller  at  the  database-creation  time  by  the  data  placement  strategy. 
Upon  receiving  a  message  from  the  controller,  the  specific  back-end  will 
then  continue  into  the  address  generation  phase  by  producing  a  secondary 
address  for  record  insertion. 

Thus,  in  this  chapter,  we  have  completely  described  the  execution  of 
the  four  types  of  requests  that  may  be  issued  to  MDBS. 
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This  table  is  stored  at  the  controller  to  be  used  during 
record  insert  and  database  re-organization. 


Cluster  Id 

Next  Back-end 
for  Insertion 

Number  of 
Tracks* 

• 

• 

• 

• 

• 

• 

*  Information  in  this  and  additional  columns  will  be  used 
for  database  re-organization.  (See  Chapter  7  for 
re-organization) 


Figure  32.  The  Cluster-Id -To-Next-Back-end  Table  (CINBT) 
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APPENDIX  A:  FORMAL  SPECIFICATION  OF  DML 

The  following  is  the  BNF  syntax  for  the  DML  of  Chapter  3.  Square 
brackets  [  ]  are  used  to  indicate  optional  constructs. 


Predicate 

attribute 

attribute-being -mod if led 

base-attribute 

rel-op 

value 

Conjunct 

Query 

Stat 

list-el 

list 

Target-list 

Attr ib-val-pair 

Half -record 

Record 

Pointer 

Modifier 

type-0 

type-I 

type- I I 

type-III 

type- IV 

Request 

Insert 

Delete 

Update 

Retrieve 

uc-letter 

str ing 

lc-letter 

char -str ing 

digit 

number 

float 

add -op 

mult-op 

exprl 

arith-terml 

arith-factorl 

expr2 

arith-term2 

arith-factor2 


(attribute  rel-op  value) 

char-string 

attribute 

attribute 

<NI>M-|* 

str ing I  number | f loa t 

(Predicate) | (Conjunct  A  Predicate) 

(Conjunct) | (Query  V  Conjunct) 

AVG j MAX | MIN | SUM | COUNT 
Stat  (attribute) 

attr ibute | list-el | list, attribute | list, list-el 
(list) 

<  a  t  tr ibu  t  e , value> 

Attr ib-val-pair | Half-record ,  Attr ib-val-pair 

(Half-record) 

number 

type-o| type-I | type-II | type-III | type- IV 

<attribute-being-modif ied=value> 

<attribute-being-modif ied=exprl> 

<attribute-being-modified:Bexpr2> 

<attribute-being-modified*expr2  of  Query> 

<attribute-being-modif ied=expr2  of  Pointer> 

Insert | Delete | Update | Retrieve 

INSERT  Record 

DELETE  Query 

UPDATE  Query  Modifier 

RETRIEVE  Query  Target-list  [BY  Attribute] 

[WITH  Pointer] 

a|b|c|....|z 

uc-letter I uc-letter  string 
a|b|c|....|z 

uc-letter | char-string  lc-letter 
0|l|2|3|4|5|6|7|8|9 
digit |digit  number 
number .  number 
+  1- 

*1  •  . 

arith-terml | exprl  add-op  arith-terml 
arith-factorl | arith-terml  mult-op  arith-factorl 
attr ibute-being-mod if ied | number 
arith-term2 | expr2  add-op  arith-term2 
arith-factor2|arith-term2  mult-op  arith-factor2 
base-attribute | number 
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APPENDIX  B:  PROOF  THAT  A  RECORD  BELONGS  TO  ONE  AND  ONLY  ONE  CLUSTER 


Consider  a  record  R  consisting  of  the  directory  keywords  K  ,  K^, 

where  contains  directory  attribute  A^.  If  n  is  zero,  then  tne  record  R 
contains  no  directory  keywords.  Hence,  it  is  derived  from  the  cluster 
defined  by  the  empty  set  of  descriptors  and  this  is  the  only  cluster  it  is 
derived  from.  Next,  let  us  consider  the  case  when  n  is  non-zero.  For  pur¬ 
poses  of  contradiction,  let  a  record  R  belong  to  two  different  clusters  X 
and  Y.  Consider  directory  keyword  K1  in  R.  Since  R  belongs  to  cluster  X, 

X  must  contain  a  descriptor  Xj,  from  which  Kj  is  derived.  Similarly,  X  must 
contain  a  descriptor  X„  from  which  K2  is  derived.  Therefore,  X  is  defined 
by  the  descriptor  set  (Xj ,  X2,  ...  Xn}.  Similarly,  Y  is  defined  by  the  de¬ 
scriptor  set  {Y  ,  Y  ,  ....  Yn}.  Now,  by  the  definition  of  the  X^s  and  the 
Y^s,  the  following  n  statements  must  be  true. 


(1)  Kj  is  derived  from  both  Xj  and  Yj . 

(2)  K2  is  derived  from  both  X2  and  Y2> 


(n)  Kn  is  derived  from  both  Xjj  and  Yn. 

Using  the  statement  (1)  above,  we  shall  show  that  Xj  =  Y1 . 

The  first  point  to  be  noted  is  that  Xj  and  Yj  must  both  contain  Aj  , 
since,  K1  contains  Aj .  Let  Kj  be  of  the  form  <Aj,V>.  Now,  there  are 
three  possibilities  for  Xj  as  below. 

(a)  Xj  is  a  type-A  descriptor  of  the  form  (Lj  5  A  i  Uj), 

(b)  Xj  is  a  type-B  descriptor  of  the  form  (Aj  =  Vj). 

(c)  Xj  is  a  type-C  descriptor  of  the  form  Aj . 

Similarly,  there  are  three  possibilities  for  Yj  as  below. 

(a)  Yj  is  a  type-A  descriptor  of  the  form  (L2  5  Aj  <  U2) . 

(b)  Yj  is  a  type-B  descriptor  of  the  form  (Aj  *  V2). 

(c)  Yj  is  a  type-C  descriptor  of  the  form  Aj . 

Thus,  there  are  nine  possibilities  for  the  combination  of  Xj  and  Yj .  We 
shall  consider  all  of  the  nine  cases,  in  turn. 

Case  1:  Xj  is  (Lj  S  Aj  <  Dj). 

Yj  is  (L2  5  A2  <  U2). 

Now,  since  Kj  is  derived  from  Xj ,  V  lies  between  Li  and  U* .  Also, 
since  Kj  is  derived  from  Yj,  V  lies  between  L2  and  IJ2.  Rule  1  of  the 
rules  for  forming  descriptors  (see  Chapter  3)  states  that  the  ranges 
specified  in  type-A  descriptors  for  a  given  attribute  must  be  mutually 
exclusive.  Since  the  ranges  of  Xj  and  Yj  are  not  mutually  exclusive, 
they  cannot  be  two  different  descriptors.  Hence,  they  must  be  the 
same  descriptor.  Thus,  Xj  “  Yj . 

Case  2:  Xj  is  (Lj  SA,  <  Uj) 

Yj  is  (Aj  »  V2). 

Since  Kj  is  derived  from  Xj,  V  lies  between  Lj  and  U] .  Since  Kj  is 
is  derived  from  Yj,  Vj  must  be  equal  to  V?.  Thus,  V2  lies  between 
Lj  and  .  However,  this  is  in  violation  of  Rule  2  for  forming  des¬ 
criptors,  since,  there  is  a  type-B  descriptor  whose  value  is  enclosed 
in  the  range  of  a  type-A  descriptor.  Hence,  it  is  not  possible  that 
Xj  is  of  type-A  and  Yj  is  of  type-B. 
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Case  3:  Xj  is  (Lj  S  Aj  <  Uj). 

Yj  is  Aj . 

'inis  is  a  clear  violation  of  Rule  3  for  forming  descriptors  which 
states  that  an  attribute  that  appears  in  a  type-C  descriptor  cannot 
appear  also  in  a  type-A  or  type-B  descriptor.  Thus,  this  case  is 
not  possible. 

Case  4:  Xj  is  (Aj  =  Vj). 

Yj  is  (L2  <  A3  <  U2). 

Since  Kj  is  derived  from  X3 ,  V  must  be  equal  to  Vj .  However,  since 
K1  is  also  derived  from  Yj ,  V,  and  hence  Vj,  must  lie  between  L2  and 
U2.  This  is  a  violation  of  Rule  2  for  forming  descriptors.  Hence, 
this  case  is  not  possible. 

Case  5:  Xj  is  (Aj  =  Vj). 

Yj  is  ( Al  =  V2). 

Since  Kj  is  derived  from  Xj ,  V  must  be  equal  to  Vj.  But,  since  Kj 
is  derived  from  Yj  also,  V  is  also  equal  to  V2.  Hence,  V3  =  V2. 
Hence,  Xj  =  Yj . 

Case  6:  Xj  is  (A3  =  Vj) 

Yj  is  Aj . 

This  is  a  clear  violation  of  Rule  3  for  forming  descriptors  since  an 
attribute  that  appears  in  a  type-C  descriptor  cannot  also  appear  in 
in  a  type-A  descriptor.  Hence,  this  case  is  not  possible. 

Case  7 :  Xj  is  Aj. 

Yj  is  <L2  <  Aj  <  U2). 

This  is  again  a  violation  of  Rule  3  for  forming  descriptors  and, 
hence,  this  case  is  not  possible. 

Case  8:  Xj  is  Aj . 

Yj  is  (Aj  «  V2). 

Once  again,  we  have  a  violation  of  Rule  3  for  forming  descriptors. 
Hence,  this  case  is  not  possible. 

Case  9:  Xj  is  Aj. 

Yj  is  A! . 

Clearly,  Xj  =  Yj . 

Out  of  all  the  nine  cases  considered,  only  three  were  found  to  be  possible. 

In  each  of  these  three  cases,  Xj  was  found  to  he  equal  to  Yj.  Thus,  Xj  =  Y^. 

We  may  similarly  use  statement  (2)  to  show  that  X2  is  equal  to  Y2, 
statement  (3)  to  show  that  X3  is  equal  to  Y3,  and  so  on.  Thus,  X  =  Y.  Hence, 
R  does  not  belong  to  two  different  clusters.  Hence,  a  record  can  belong  to 
one  and  only  one  cluster. 
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APPENDIX  C:  DIRECTORY  MANAGEMENT  ALGORITHMS 

Here,  we  will  describe  two  algorithms.  The  first  algorithm  determines 
the  cluster  to  which  a  record  for  insertion  belongs  and  it  also  determines 
the  secondary  memory  addresses  of  this  cluster.  This  algorithm  is  called  the 
directory  management  algorithm  for  an  insert  request.  The  algorithm  is  pre¬ 
sented  as  two  different  procedures  because  it  proceeds  in  two  distinct  phases 
called  the  descriptor  search  phase  and  the  address  generation  phase . 

The  second  algorithm  describes  how  the  corresponding  set  of  clusters 
(and  their  addresses)  for  the  query  in  a  non-insert  request  are  determined. 
This  algorithm  is  called  the  directory  management  algorithm  for  a  non-insert 
request.  Once  again,  the  algorithm  is  presented  as  two  different  procedures. 

Three  tables  are  used  in  these  algorithms.  The  first  table  is  called 
the  pointer  table  (PT) .  It  has  as  many  entries  as  there  are  directory  attri¬ 
butes.  Each  entry  in  the  PT  consists  of  two  fields.  The  first  field  contains 
a  directory  attribute  and  the  second  field  contains  a  pointer  to  another  table 
called  the  descriptor-to-descriptor-id  table  (DDIT) .  The  pointer  in  the  se¬ 
cond  field  of  an  entry  in  PT  points  to  the  location  in  the  DDIT  where  the 
descriptors  of  the  attribute  in  the  first  field  of  the  PT  entry  are  stored. 

Tlie  DDIT  has  as  many  entries  as  there  are  descriptors.  Each  entry  con¬ 
sists  of  two  fields.  The  first  field  contains  a  descriptor  and  the  second 
field  contains  a  descriptor  id.  A  view  of  a  DDIT  is  shown  in  Figure  14. 

Finally,  there  is  a  table  called  the  cluster  definition  table  (CDT) . 

There  are  as  many  entries  in  the  CDT  as  there  are  clusters  in  the  database. 
Each  entry  in  the  CDT  consists  of  three  fields.  These  contain  a  cluster 
number,  a  cluster  definition  (as  a  descriptor  set)  and  one  or  more  secondary 
memory  addresses,  respectively.  The  secondary  memory  addresses  in  the  third 
field  are  of  the  cluster  in  the  first  field.  A  view  of  a  CDT  is  shown  in 
Figure  15. 

Physically,  we  assume  that  the  CDT  is  implemented  as  follows.  Consider  a 
database  with  i  directory  attributes  and  D  descriptors  per  attributes.  Then, 
the  CDT  consists  of  iD  entries.  Each  entry  is  formed  with  a  descriptor  id 
followed  by  a  list  of  cluster  definitions  and  their  corresponding  secondary 
memory  addresses. 


We  present  two  procedures  below.  The  first  one  is  executed  during  the 
descriptors  search  phase  and  the  second  one  is  executed  during  the  address 
generation  phase. 


PROCEDURE  DESCRIPTOR  SEARCH 

Purpose:  Given  a  record  for  insertion,  it  determines  the  corresponding 
descriptor  for  each  attribute-value  pair  in  the  record. 

Input:  The  record  to  be  inserted.  Let  there  be  n  attribute-value 

pairs  in  the  record  for  insertion.  Thus,  the  record  for 
insertion  is  (<Aj,Vj>,  <A2,V2>,  ...,  <An,Vn>) . 

Output:  A  set  of  k  descriptors,  where  k  is  the  number  of  directory 

attributes.  These  descriptors  are  output  in  the  array 
D(l) ,  D(2) ,  ....  D(k) . 
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Step  1:  Set  i«l.  Set  j*l. 

Step  2:  Consider  the  attribute-value  pair  <A^,Vi>.  Follow  the 

pointer  in  PT,  P^,  to  the  location  in  DDIT  where  the  des¬ 
criptors  of  are  stored.  Let  m=P-£.  Set  +  i  to  the 
next  pointer  in  PT  after  P^.  If  no  pointer  P^,  is  found, 
go  to  Step  4. 

Step  3:  Check  if  <Aj,V^>  can  be  derived  from  the  m-th  descriptor 

of  DDIT.  If  so,  set  D(j)  =  m-th  descriptor  of  DDIT,  j  =  j+1, 
and  go  to  Step  4.  Else,  go  to  step  5. 

Step  4:  i  =  i  +  1.  If  i  >  n,  then  go  to  Step  6.  Else,  go  to  Step  2. 

Step  5:  Set  m  =  m  +  1.  If  m*  P^  +  .,  go  to  Step  3.  Else,  set 

D(j)  “null  descriptor  on  A^,  j  =  j  +  1  and  go  to  Step  4. 

Step  6:  Output  D(l),  D(2),  ...,  D(k).  Terminate. 

PROCEDURE  ADDRESS  GENERATE 

Purpose:  Generates  the  cluster  to  which  the  record  for  insertion 
belongs  and  its  secondary  memory  address  or  addresses. 

Input:  D(l) ,  D(2),  ...,  D(k)  which  was  output  from  the  previous 

algorithm. 

Output:  The  cluster  to  which  the  record  belongs  and  the  secondary 

memory  addresses,  of  this  cluster. 


Step  1:  Consider  the  non-null  descriptors  in  D(l),  D(2) . D(k) 

and  let  these  be  NDj,  ND2,  ...  NDp.  Let  the  attributes  in 
the  non-null  descriptors  be  AT^ ,  AT2,  ...,  AT_.  Let  there 
be  d^  descriptors  in  the  database  for  attribute  AT^.  Choose 
that  attribute,  ATj ,  in  ATj ,  AT2,  ...  ATp  with  the  maximum 
number  of  descriptors  on  it,  d j . 

Step  2:  Access  the  CDT  entry  for  attribute  ATj  and  descriptor  ND  j . 

Search  this  entry  until  the  cluster  defined  by  D(l),  D(2),..., 
D(k)  is  encountered. 

Step  3:  Output  the  cluster  number  and  its  corresponding  secondary 
addresses  as  stored  in  the  entry.  Terminate. 


Directory  Management  for  a  Non-Insert  Request 


Once  again,  we  shall  present  two  procedures.  The  first  one  is  executed 
by  MDBS  during  the  descriptors  search  phase  and  the  second  one  is  executed 
during  the  address  generation  phase. 


PROCEDURE  DESCRIPTOR  SEARCH 


Purpose:  Given  a  non-insert  request  with  a  query,  it  determines  the 
corresponding  descriptor  for  each  predicate  in  the  query. 


Input:  The  query  in  disjunctive  normal  form.  Let  there  be  i  con¬ 

junctions  and  let  there  be  P,  predicates  in  the  j-th  con¬ 
junction.  J 
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Output:  A  two-dimensional  array  D  in  which  D(x,y)  contains  the 

corresponding  dewcriptor  for  the  y-th  predicate  of  the  x-th 
conjunction. 


Step  1:  Set  m  =  l.  [At  any  point,  the  m-th  conjunction  is  being  examined. 
Set  n=  1.  [At  any  point,  the  n-th  predicate  of  the  m-th  con¬ 
junction  is  being  examined.]  Set  k*l.  Let  the  array  D  be 
initially  empty. 

Step  2:  Let  the  attribute  in  the  n-th  predicate  of  the  m-th  conjunction 
be  A  and  let  its  value  be  V.  Use  PT  to  determine  the  location 
in  DDIT  where  the  descriptors  corresponding  to  A  are  stored. 

Set  P  and  START  to  this  value.  Set  END  to  the  location  in  DDIT 
where  the  descriptors  corresponding  to  A  end. 

Step  3:  Check  if  <A,V>  can  be  derived  from  the  p-th  descriptor  in  DDIT. 

If  so,  set  D(m,n)  to  the  p-th  descriptor  of  DDIT  and  go  to 
Step  4.  Else,  set  p  =  p+l.  If  p5END,  go  to  Step  3.  Else, 
go  to  Step  4. 

Step  4:  Set  n  =  n+l.  If  n>Pm,  then  go  to  Step  5.  Else,  go  to  Step  2. 

Step  5:  Set  m  =  m+l.  If  m>i,  then  stop.  Else,  set  n=l  and  go  to 
Step  2. 

PROCEDURE  ADDRESS  GENERATE 

Purpose:  Generates  the  clusters  which  will  satisfy  a  user  request  and 
their  secondary  memory  addresses. 

Input:  The  two-dimensional  array  D  from  the  previous  procedure  and 

the  user  query  of  i  conjunctions. 

Output:  The  clusters  which  will  satisfy  the  user  request  and  their 

secondary  memory  addresses. 

Step  1:  Set  m=l  [At  any  time,  we  examine  the  m-th  conjunction].  Let 
query  conjunction  m  have  Pm  predicates. 

Step  2:  Set  n=Pm.  Consider  the  descriptors  D(m,l),  D(m,2),  ..., 

D(m,n).  Let  the  attributes  in  these  descriptors  be  ATj ,  AT2> 

....  ATn.  Also,  let  D(tn,i),  lSiSn,  be  the  x^-th  descriptor 
on  AT^.  Also,  let  there  be  d^  descriptors  in  all,  in  the  data¬ 
base,  on  ATj. 

Step  3:  Associate  a  count  C^  with  each  AT^  as  follows.  If  the  i-th 
predicate  of  the  m-th  conjunction  contains  the  operator 
Cj  =  1/dj.  If  the  i-th  predicate  of  the  m-th  conjunction 
contains  the  operator  '<'  or  '5',  then  set  C^  =  x^d^.  Else, 
set  Cj  =  (dj  -  Xi)/d£. 

Step  4:  Choose  the  AT^  with  minimum  C£.  Access  the  entry  in  the  CDT 

for  attribute  ATj  and  descriptor  D(m,i).  If  the  i-th  predicate 
of  the  m-th  conjunction  is  '<’  or  's',  then  also  search  the 
entries  corresponding  to  attribute  AT*  and  all  descriptors  'less 
than’  D(m,i).  Else,  if  the  i-th  predicate  of  the  m-th  conjunction 
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1s  ’>'  or  ’>',  then  also  search  the  entries  corresponding 
to  attribute  AT^  and  all  descriptors  'greater  than'  D(m,i). 

Step  5:  For  each  cluster  encountered  in  searching  these  entries, 
do  Step  6. 

Step  6:  Set  p  =  l.  Consider  attribute  ATp  and  the  p-th  predicate  of 
the  i-th  conjunction.  If  the  predicate  is  then  check 

to  see  if  the  cluster  definition  includes  descriptor  D(m,p). 
If  the  predicate  is  or  '£',  then  check  to  see  if  the 

cluster  definition  includes  descriptor  D(m,p)  or  any  des¬ 
criptor  'less  than'  D(m,p).  Otherwise,  check  to  see  if  the 
cluster  definition  includes  descriptor  D(m,p)  or  any  des¬ 
criptor  'greater  than'  D(m,p).  If  the  checks  fail  at  any 
point,  then  the  cluster  being  examined  does  not  satisfy  the 
user  request.  Otherwise,  set  p  =  p  +  l.  If  p  >  n,  then  output 
the  cluster  being  examined  and  its  corresponding  address  (or 
addresses).  Else,  go  to  Step  6. 

Step  7:  Set  m=m+l.  If  m  >  i.  Terminate.  Else,  go  to  Step  2. 


k 
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APPENDIX  D:  ALGORITHM  FOR  BINARY  SEARCH  OF  DESCRIPTORS 

In  this  appendix,  we  shall  calculate  the  time  for  a  binary  search  of 
N  descriptors.  Let 

tpr  :  time  taken  to  perform  an  arithmetic  operation 
td  :  time  taken  to  read  a  descriptor 
N  :  number  of  descriptors 
K  :  descriptor  being  searched  for 
g(a)  :  the  nearest  integer  greater  than  or  equal  to  a 
h(a)  :  the  nearest  integer  less  than  or  equal  to  a 
lg  :  logarithm  of  the  base  2 

Let  the  N  descriptors  we  are  searching  be  Kj,  K2,  ...  K^.  Then,  the 
algorithm  for  binary  search  of  these  N  descriptors  is  as  follows. 

1.  i  =  g(N/2) ,  m  =  g(N/2) 

2.  If  K<  K.  go  to  3 

If  K  >  go  to  A 

If  K  =  success 

3.  If  m  ■  0  failure 
else  i  ■  i  -  g(m/2) 

m  -  h(m/2) 
go  to  2 

A.  If  m=*0  failure 
else  i *  i  +  g(m/2) 
m  =  h(m/'2) 
go  to  2 

From  the  above,  the  total  time  to  do  binary  search  on  N  descriptors  is 

2tpr  +  lg(N(td  +  Atpr)) 
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APPENDIX  E:  I/O  SUBMODEL  FOR  DISK  SYSTEM 

The  I/O  submodel  consisting  of  M  drive  queues  and  one  channel  queue  Is 
shown  in  Figure  20.  The  inter-arrival  distribution  is  chosen  as  exponential 
with  L  as  the  mean  rate  at  which  requests  are  received  by  the  disk  system. 

The  inter -arrival  distribution  of  requests  to  each  drive  queue  is  also  ex¬ 
ponential  with  a  mean  request  rate  of  L/M.  Let  us  analyze  the  drive  queues 
and  the  channel  queue,  in  turn. 

Analyzing  a  Drive  Queue  -  The  drive  queue  service  time  is  the  seek  time.  The 
seek  time  of  a  disk  drive  is  approximated  by  an  equation  of  the  form  (a  +  by), 
where  a  and  b  are  constants  and  y  is  the  number  of  cylinders  traversed  during 
the  seek.  We  let  N  be  the  total  number  of  cylinders  in  a  disk  drive.  Then, 

Mean  service  time  *  Mean  seek  time 

N  2  2v 

"  1  (a  +  by)(|  -  $f) 

y*i 

N  ,  N  N 

*  2a-||  Zy+f  IT* 

y=l  yKl  y=l 

a  ,  bN  b 

*  a  -  »  +  T-  3» 

Second  moment  of  service  time 


Z  (a2  +  b2y2  +  2aby)  -  -^r) 
y=l 


>2-t  + 


b2  2abN  2ab 
6  3  3N 


Variance  of  service  time 


=  Second  moment  of  service  time  -  (Mean  service  time)2 

b2N2  ,  b£  a2  b2  2ab  .  .  2ab 

18  18  ”  P"  "  9P"  ”  3N?  N  3 

Analyzing  the  Channel  Queue  -  It  is  easy  to  see  that 


Mean  service  time 
Variance  of  service  time 


disk  rotation  time 


Mean  Inter -arrival  time 


Variance  of  Inter-arrival  time 


The  channel  queue  is  analyzed  as  an  M/G/l  queue  [Klei75]  whose  waiting  time 
is 

Ld2 

2(1-Ld> 
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Let  total  time  in  the  channel  system  be  the  sum  of  the  channel  wait  and 
service  times.  Let 


w  ■  channel  wait  time 
x  *  channel  service  time 


s 

bT 
var (s) 


mean  time  in  channel 

second  moment  of  time  in  channel 

I?  _  (s)2  =  variance  of  time  in  channel 


Then, 


s? 


(w  +  x) 


W  +  X 

Ld2 


+  d 


2(1-Ld) 

(w  +  x)2  =  w2"  +  *7  +  2  x  w 


L2d4 


=  w^+d2  + 


Ld' 


Ld; 


(1-Ld) 


=  2 (1-Ld) ^  +  3 (1-Ld) 

Substituting  in  (1)  t 


^7  = 


L2d4 


Ld; 


2 (1-Ld) 2  3 (1-Ld) 


+  d2  + 


Ld' 


(1-Ld) 


Finally, 

var(s)  =*  s7  -  (s)2 


(1) 
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APPENDIX  F  :  ADDRESS  GENERATION  TIMES 

The  time  taken  for  address  generation  Is  really  the  time  taken  to 
search  the  augmented  CDT  for  the  addresses  of  the  corresponding  set  of 
clusters.  We  assume  that  the  augmented  CDT  is  physically  implemented  as 
follows.  Consider  a  database  with  i  directory  attributes  and  D  descrip¬ 
tors  per  attribute.  Then,  the  augmented  CDT  consists  of  iD  entries. 

Each  entry  is  formed  with  a  descriptor  id  followed  by  a  list  of  cluster 
definitions  and  their  corresponding  secondary  addresses.  Let 


i 

D 

t 

P 

c 

lg 

u(x) 

n 

s 

z 

X 

ad  gen 

y 

adgenl 


number  of  directory  attributes 

number  of  descriptors  per  attribute 

number  of  bytes  needed  to  store  a  track  address 

number  of  bytes  needed  to  indicate  a  back-end  number 

size  of  a  cluster  in  tracks 

logarithm  to  base  2 

nearest  integer  greater  than  or  equal  to  x 
number  of  back-ends 
track  size 

time  to  access  a  track  from  secondary  memory 

size  of  a  CDT  entry  in  the  centralized  strategy 

time  for  address  generation  in  the  centralized  strategy 

size  of  a  CDT  entry  in  the  other  strategies 

time  for  address  generation  in  the  other  strategies 


The  size,  x,  of  each  CDT  entry  in  the  centralized  strategy  is 
u(^|2)  +  D±-1(u(i|£)i  +  c(t  +  p)) 

So, 

,  xz 

adgen  ■  — 

Similarly,  the  size  y  of  each  CDT  entry  in  the  remaining  strategies  is 
u(ijp)  +^li  (u(i|E)i+t),  if  c<n; 

u(l|£)  +  D1_1(u(^)i  +  — ! ),  otherwise 
o  o  n 

Finally, 

adgenl  = 


